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© G-CSF analog compositions and methods. 



© Provided herein are granulocyte colony stimulating factor ("G-CSF") analogs, compositions containing such 
analogs, and related compositions. In another aspect, provided herein are nucleic acids encoding the present 
analogs or related nucleic acids, related host cells and vectors. In yet another aspect, provided herein are 
computer programs and apparatuses for expressing the three dimensional structure of G-CSF and analogs 
thereof. In another aspect, provided herein are methods for rationally designing G-CSF analogs and related 
compositions. In yet another aspect, provided herein are methods for treatment using the present G-CSF 
analogs. 
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Field of the Invention 

This invention relates to granulocyte colony stimulating factor ("G-CSF") analogs, compositions contain- 
ing such analogs, and related compositions. In another aspect, the present invention relates to nucleic acids 
5 encoding the present analogs or related nucleic acids, related host cells and vectors. In another aspect, the 
invention relates to computer programs and apparatuses for expressing the three dimensional structure of 
G-CSF and analogs thereof. In another aspect, the invention relates to methods for rationally designing G- 
CSF analogs and related compositions. In yet another aspect, the present invention relates to methods for 
treatment using the present G-CSF analogs. 

io . 

# Background 

Hematopoiesis is controlled by two systems: the cells within the bone marrow microenvironment and 
growth factors. The growth factors, also called colony stimulating factors, stimulate committed progenitor 
;s cells to proliferate and to form colonies of differentiating blood cells. One of these factors is granulocyte 
colony stimulating factor, herein called G-CSF, which preferentially stimulates the growth and development 
of neutrophils, indicating a potential use in neutropenic states. Welte et al., PNAS-USA 82: 1526-1530 
(1985); Souza et al., Science 232: 61-65 (1986) and Gabrilove, J. Seminars in Hematology 26: (2) 1-14 
(1989). 

20 In humans, endogenous G-CSF is detectable in blood plasma. Jones et al., Bailliere's Clinical 
Hematology 2 (1): 83-111 (1989). G-CSF is produced by fibroblasts, macrophages, T cells trophoblasts, 
endothelial cells and epithelial cells and is the expression product of a single copy gene comprised of four 
exons and five introns located on chromosome seventeen. Transcription of this locus produces a mRNA 
species which is differentially processed, resulting in two forms of G-CSF mRNA, one version coding for a 

25 protein of 177 amino acids, the other coding for a protein of 174 amino acids, Nagata et al., EMBO J 5: 
575-581 (1986), and the form comprised of 174 amino acids has been found to have the greatest specific in 
vivo biological activity. G-CSF is species cross-reactive, such that when human G-CSF is administered to 
another mammal such as a mouse, canine or monkey, sustained neutrophil leukocytosis is elicited. Moore 
et al., PNAS-USA 84: 7134-7138 (1987). 

30 Human G-CSF can be obtained and purified from a number of sources. Natural human G-CSF (nhG- 
CSF) can be isolated from the supernatants of cultured human tumor cell lines. The development of 
recombinant DNA technology, see, for instance, U.S. Patent 4,810,643 (Souza) incorporated herein by 
reference, has enabled the production of commercial scale quantities of G-CSF in glycosylated form as a 
product of eukaryotic host cell expression, and of G-CSF in non-glycosylated form as a product of 

35 prokaryotic host cell expression. 

G-CSF has been found to be useful in the treatment of indications where an increase in neutrophils will 
provide benefits. For example, for cancer patients, G-CSF is beneficial as a means of selectively stimulating 
neutrophil production to compensate for hematopoietic deficits resulting from chemotherapy or radiation 
therapy. Other indications include treatment of various infectious diseases and related conditions, such as 

<o sepsis, which is typically caused by a metabolite of bacteria. G-CSF is also useful alone, or in combination 
with other compounds, such as other cytokines, for growth or expansion of cells in culture, for example, for 
bone marrow transplants. 

Signal transduction, the way in which G-CSF effects cellular metabolism, is not currently thoroughly 
understood. G-CSF binds to a cell-surface receptor which apparently initiates the changes within particular 

« progenitor cells, leading to cell differentiation. 

Various altered G-CSF's have been reported. Generally, for design of drugs, certain changes are known 
to have certain structural effects. For example, deleting one cysteine could result in the unfolding of a 
molecule which is, in its unaltered state, is normally folded via a disulfide bridge. There are other known 
methods for adding, deleting or substituting amino acids in order to change the function of a protein. 

so Recombinant human G-CSF mutants have been prepared, but the method of preparation does not 
include overall structure/function relationship information. For example, the mutation and biochemical 
modification of Cys 18 has been reported. Kuga et al., Biochem. Biophy. Res. Comm 159: 103-111 (1989); 
Lu et al.. Arch. Biochem. Biophys. 268: 81-92 (1989). 

In U.S. Patent No. 4, 810, 643, entitled, "Production of Pluripotent Granulocyte Colony-Stimulating 

55 Factor" (as cited above), polypeptide analogs and peptide fragments of G-CSF are disclosed generally. 
Specific G-CSF analogs disclosed include those with the cysteins at positions 17, 36, 42, 64, and 74 (of the 
174 amino acid species or of those having 175 amino acids, the additional amino acid being an N-terminal 
methionine) substituted with another amino acid, (such as serine), and G-CSF with an alanine in the first (N- 
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terminal) position. 

EP 0 335 423 entitled "Modified human G-CSF" reportedly discloses the modification of at least one 
amino group in a polypeptide having hG-CSF activity. 

EP 0 272 703 entitled "Novel Polypeptide" reportedly discloses G-CSF derivatives having an amino 
5 acid substituted or deleted at or "in the neighborhood" of the N terminus. 

EP 0 459 630, entitled "Polypeptides" reportedly discloses derivatives of naturally occurring G-CSF 
having at least one of the biological properties of naturally occurring G-CSF and a solution stability of at 
least 35% at 5 mg/ml in which the derivative has at least Cys' 7 of the native sequence replaced by a Ser' 7 
residue and Asp 27 of the native sequence replaced by a Ser 27 residue. 
10 EP 0 256 843 entitled "Expression of G-CSF and Muteins Thereof and Their Uses" reportedly discloses 
a modified DNA sequence encoding G-CSF wherein the N-terminus is modified for enhanced expression of 
protein in recombinant host cells, without changing the amino acid sequence of the protein. 

EP 0 243 153 entitled "Human G-CSF Protein Expression" reportedly discloses G-CSF to be modified 
by inactivating at least one yeast KEX2 protease processing site for increased yield in recombinant 
is production using yeast. 

Shaw, U.S. Patent No. 4,904,584, entitled "Site-Specific Homogeneous Modification of Polypeptides," 
reportedly discloses lysine altered proteins. 

WO/9012874 reportedly discloses cysteine altered variants of proteins. 

Australian patent application Document No. AU-A-1 0948/92, entitled, "Improved Activation of Recom- 
20 binant Proteins" reportedly discloses the addition of amino acids to either terminus of a G-CSF molecule for 
the purpose of aiding in the folding of the molecule after prokaryotic expression. 

Australian patent application Document No. AU-A-76380/91 , entitled, "Muteins of the Granulocyte 
Colony Stimulating Factor (G-CSF)" reportedly discloses muteins of the granulocyte stimulating factor G- 
CSF in the sequence Leu-Gly-His-Ser-Leu-Gly-lle at position 50-56 of G-CSF with 174 amino acids, and 
25 position 53 to 59 of the G-CSF with 1 77 amino acids, or/and at least one of the four histadine residues at 
positions 43, 79, 156 and 170 of the mature G-CSF with 174 amino acids or at positions 46, 82, 159, or 173 
of the mature G-CSF with 177 amino acids. 

GB 2 213 821, entitled "Synthetic Human Granulocyte Colony Stimulating Factor Gene" reportedly 
discloses a synthetic G-CSF-encoding nucleic acid sequence incorporating restriction sites to facilitate the 
30 cassette mutagenesis of selected regions, and flanking restriction sites to facilitate the incorporation of the 
gene into a desired expression system. 

G-CSF has reportedly been crystallized to some extent, e.g., EP 344 796, and the overall structure of 
G-CSF has been surmised, but only on a gross level. Bazan, Immunology Today YV. 350-354 (1990); Parry 
et al., J. Molecular Recognition 8: 107-110 (1988). To date, there have been no reports of the overall 
35 structure of G-CSF, and no systematic studies of the relationship of the overall structure and function of the 
molecule, studies which are essential to the systematic design of G-CSF analogs. Accordingly, there exists 
a need for a method of this systematic design of G-CSF analogs, and the resultant compositions. 

Summary of the Invention 

40 

The three dimensional structure of G-CSF has now been determined to the atomic level. From this 
three-dimensional structure, one can now forecast with substantial certainty how changes in the composition 
of a G-CSF molecule may result in structural changes. These structural characteristics may be correlated 
with biological activity to design and produce G-CSF analogs. 

45 Although others had speculated regarding the three dimensional structure of G-CSF, Bazan, Immunol- 
ogy Today VU 350-354 (1990); Parry et al., J. Molecular Recognition 8: 107-110 (1988), these speculations 
were of no help to those wishing to prepare G-CSF analogs either because the surmised structure was 
incorrect (Parry et al., supra ) and/or because the surmised structure provided no detail correlating the 
constituent moieties with structure. The present determination of the three-dimensional structure to the 

so atomic level is by far the most complete analysis to date, and provides important information to those 
wishing to design and prepare G-CSF analogs. For example, from the present three dimensional structural 
analysis, precise areas of hydrophobicity and hydrophilicity have been determined. 

Relative hydrophobicity is important because it directly relates to the stability of the molecule. 
Generally, biological molecules, found in aqueous environments, are externally hydrophilic and internally 

55 hydrophobic; in accordance with the second law of thermodynamics provides, this is the lowest energy 
state and provides for stability. Although one could have speculated that G-CSF's internal core would be 
hydrophobic, and the outer areas would be hydrophilic, one would have had no way of knowing specific 
hydrophobic or hydrophilic areas. With the presently provided knowledge of areas of hydrophobic- 
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ity/philicity, one may forecast with substantial certainty which changes to the G-CSF molecule will affect the 
overall structure of the molecule. 

As a general rule, one may use knowledge of the geography of the hydrophobic and hydrophilic regions 
to design analogs in which the overall G-CSF structure is not changed, but change does affect biological 

5 activity ("biological activity" being used here in its broadest sense to denote function). One may correlate 
biological activity to structure. If the structure is not changed, and the mutation has no effect on biological 
activity, then the mutation has no biological function. If, however, the structure is not changed and the 
mutation does affect biological activity, then the residue (or atom) is essential to at least one biological 
function. Some of the present working examples were designed to provide no change in overall structure, 

w yet have a change in biological function. 

Based on the correlation of structure to biological activity, one aspect of the present invention relates to 
G-CSF analogs. These anatogs are molecules which have more, fewer, different or modified amino acid 
residues from the G-CSF amino acid sequence. The modifications may be by addition, substitution, or 
deletion of one or more amino acid residues. The modification may include the addition or substitution of 

J5 analogs of the amino acids themselves, such as peptidomimetics or amino acids with altered moieties such 
as altered side groups. The G-CSF used as a basis for comparison may be of human, animal or 
recombinant nucleic acid-technology origin (although the working examples disclosed herein are based on 
the recombinant production of the 174 amino acid species of human G-CSF, having an extra N-terminus 
methionyl residue). The analogs may possess functions different from natural human G-CSF molecule, or 

20 may exhibit the same functions, or varying degrees of the same functions. For example, the analogs may 
be designed to have a higher or lower biological activity, have a longer shelf-life or a decrease in stability, 
be easier to formulate, or more difficult to combine with other ingredients. The analogs may have no 
hematopoietic activity, and may therefore be useful as an antagonist against G-CSF effect (as, for example, 
in the overproduction of G-CSF). From time to time herein the present analogs are referred to as proteins or 

25 peptides for convenience, but contemplated herein are other types of molecules, such as peptidomimetics 
or chemically modified peptides. 

In another aspect, the present invention relates to related compositions containing a G-CSF analog as 
an active ingredient. The term, "related composition," as used herein, is meant to denote a composition 
which may be obtained once the identity of the G-CSF analog is ascertained (such as a G-CSF analog 

30 labeled with a detectable label, related receptor or pharmaceutical composition). Also considered a related 
composition are chemically modified versions of the G-CSF analog, such as those having attached at least 
one polyethylene glycol molecule. 

For example, one may prepare a G-CSF analog to which a detectable label is attached, such as a 
fluorescent, chemiluminescent or radioactive molecule. 

35 -Another example is a pharmaceutical composition which may be formulated by known techniques using 
known materials, see, ejj., Remington's Pharmaceutical Sciences, 18th Ed. (1990, Mack Publishing Co., 
Easton, Pennsylvania 18042) pages 1435-1712, which are herein incorporated by reference. Generally, the 
formulation will depend on a variety of factors such as administration, stability, production concerns and 
other factors. The G-CSF analog may be administered by injection or by pulmonary administration via 

40 inhalation. Enteric dosage forms may also be available for the present G-CSF analog compositions, and 
therefore oral administration may be effective. G-CSF analogs may be inserted into liposomes or other 
microcarriers for delivery, and may be formulated in gels or other compositions for sustained release. 
Although preferred compositions will vary depending on the use to which the composition will be put, 
generally, for G-CSF analogs having at least one of the biological activities of natural G-CSF, preferred 

45 pharmaceutical compositions are those prepared for subcutaneous injection or for pulmonary administration 
via inhalation, although the particular formulations for each type of administration will depend on the 
characteristics of the analog. 

Another example of related composition is a receptor for the present analog. As used herein, the term 
"receptor" indicates a moiety which selectively binds to the present analog molecule. For example, 

so antibodies, or fragments thereof, or "recombinant antibodies" ( see Huse et al., Science 246:1275 (1989)) 
may be used as receptors. Selective binding does not mean only specific binding (although binding-specific 
receptors are encompassed herein), but rather that the binding is not a random event. Receptors may be on 
the cell surface or intra- or extra-cellular, and may act to effectuate, inhibit or localize the biological activity 
of the present analogs. Receptor binding may also be a triggering mechanism for a cascade of activity 

55 indirectly related to the analog itself. Also contemplated herein are nucleic acids, vectors containing such 
nucleic acids and host cells containing such nucleic acids which encode such receptors. 

Another example of a related composition is a G-CSF analog with a chemical moiety attached. 
Generally, chemical modification may alter biological activity or antigenicity of a protein, or may alter other 
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characteristics, and these factors will be taken into account by a skilled practitioner. As noted above, one 
example of such chemical moiety is polyethylene glycol. Modification may include the addition of one or 
more hydrophilic or hydrophobic polymer molecules, fatty acid molecules, or polysaccharide molecules. 
Examples of chemical modifiers include polyethylene glycol, alklpolyethylene glycols, DI-poly(amino acids). 

5 polyvinylpyrrolidone, polyvinyl alcohol, pyran copolymer, acetic acid/acylation, proprionic acid, palmitic acid, 
stearic acid, dextran, carboxymethyl cellulose, pullulan, or agarose. See. Francis, Focus on Growth 
Factors 3: 4-10 (May 1992) (published by Mediscript, Mountview Court, Friern Barnet Lane, London N20 
OLD, UK). Also, chemical modification may include an additional protein or portion thereof, use of a 
cytotoxic agent, or an antibody. The chemical modification may also include lecithin. 

io In another aspect, the present invention relates to nucleic acids encoding such analogs. The nucleic 
acids may be DNAs or RNAs or derivatives thereof, and will typically be cloned and expressed on a vector, 
such as a phage or plasmid containing appropriate regulatory sequences. The nucleic acids may be labeled 
(such as using a radioactive, chemiluminescent, or fluorescent label) for diagnostic or prognostic purposes, 
for example. The nucleic acid sequence may be optimized for expression, such as including codons 

is preferred for bacterial expression. The nucleic acid and its complementary strand, and modifications thereof 
which do not prevent encoooding of the desired analog are here contemplated. 

In another aspect, the present invention relates to host cells containing the above nucleic acids 
encoding the present analogs. Host cells may be eukaryotic or prokaryotic, and expression systems may 
include extra steps relating to the attachment (or prevention) of sugar groups (glycosylation), proper folding 

20 of the molecule, the addition or deletion of leader sequences or other factors incident to recombinant 
expression. 

In another aspect the present invention relates to antisense nucleic acids which act to prevent or modify 
the type or amount of expression of such nucleic acid sequences. These may be prepared by known 
methods. 

25 In another aspect of the present invention, the nucleic acids encoding a present analog may be used for 
gene therapy purposes, for example, by placing a vector containing the analog-encoding sequence into a 
recipient so the nucleic acid itself is expressed inside the recipient who is in need of the analog 
composition. The vector may first be placed in a carrier, such as a cell, and then the carrier placed into the 
recipient. Such expression may be localized or systemic. Other carriers include non-naturally occurring 

30 carriers, such as liposomes or other microcarriers or particles, which may act to mediate gene transfer into 
a recipient. 

The present invention also provides for computer programs for the expression (such as visual display) 
of the G-CSF or analog three dimensional structure, and further, a computer program which expresses the 
identity of each constituent of a G-CSF molecule and the precise location within the overall structure of that 

35 constituent, down to the atomic level. Set forth below is one example of such program. There are many 
currently available computer programs for the expression of the three dimensional structure of a molecule. 
Generally, these programs provide for inputting of the coordinates for the three dimensional structure of a 
molecule (i.e., for example, a numerical assignment for each atom of a G-CSF molecule along an x, y, and 
z axis), means to express (such as visually display) such coordinates, means to alter such coordinates and 

w means to express an image of a molecule having such altered coordinates. One may program crystallog- 
raphy information, i.e.. the coordinates of the location of the atoms of a G-CSF molecule in three dimension 
space, wherein such coordinates have been obtained from crystallographic analysis of said G-CSF 
molecule, into such programs to generate a computer program for the expression (such as visual display) of 
the G-CSF three dimensional structure. Also provided, therefore, is a computer program for the expression 

45 of G-CSF analog three dimensional structure. Preferred is the computer program Insight II, version 4, 
available from Biosym. San Diego, California, with the coordinates as set forth in FIGURE 5 input. Preferred 
expression means is on a Silicon Graphics 320 VGX computer, with Crystal Eyes glasses (also available 
from Silicon Graphics), which allows one to view the G-CSF molecule or its analog stereoscopically. 
Alternatively, the present G-CSF crystallographic coordinates and diffraction data are also deposited in the 

so Protein Data Bank, Chemistry Department, Brookhaven National Laboratory. Upton, New York 119723, USA. 
One may use these data in preparing a different computer program for expression of the three dimensional 
structure of a G-CSF molecule or analog thereof. Therefore, another aspect of the present invention is a 
computer program for the expression of the three dimensional structure of a G-CSF molecule. Also 
provided is said computer program for visual display of the three dimensional structure of a G-CSF 

55 molecule; and further, said program having means for altering such visual display. Apparatus useful for 
expression of such computer program, particularly for the visual display of the computer image of said 
three dimensional structure of a G-CSF molecule or analog thereof is also therefore here provided, as well 
as means for preparing said computer program and apparatus. 
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The computer program is useful for preparation of G-CSF analogs because one may select specific 
sites on the G-CSF molecule for alteration and readily ascertain the effect the alteration will have on the 
overall structure of the G-CSF molecule. Selection of said site for alteration will depend on the desired 
biological characteristic of the G-CSF analog. If one were to randomly change said G-CSF molecule (r-met- 
5 hu-G-CSF) there would be 175 20 possible substitutions, and even more analogs having multiple changes, 
additions or deletions. By viewing the three dimensional structure wherein said structure is correlated with 
the composition of the molecule, the selection for sites of alteration is no longer a random event, but sites 
for alteration may be determined rationally. 

As set forth above, identity of the three dimensional structure of G-CSF, including the placement of 
io each constituent down to the atomic level has now yielded information regarding which moieties are 
. necessary to maintain the overall structure of the G-CSF molecule. One may therefore select whether to 
maintain the overall structure of the G-CSF molecule when preparing a G-CSF analog of the present 
invention, or whether (and how) to change the overall structure of the G-CSF molecule when preparing a G- 
CSF analog of the present invention. Optionally, once one has prepared such analog, one may test such 
75 analog for a desired characteristic. 

One may. for example, seek to maintain the overall structure possessed by a non-altered natural or 
recombinant G-CSF molecule. The overall structure is presented in Figures 2, 3, and 4, and is described in 
more detail below. Maintenance of the overall structure may ensure receptor binding, a necessary 
characteristic for an analog possessing the hematopoietic capabilities of natural G-CSF (if no receptor 
20 binding, signal transduction does not result from the presence of the analog). It is contemplated that one 
class of G-CSF analogs will possess the three dimensional core structure of a natural or recombinant (non- 
altered) G-CSF molecule, yet possess different characteristics, such as an increased ability to selectively 
stimulate neutrophils. Another class of G-CSF analogs are those with a different overall structure which 
diminishes the ability of a G-CSF analog molecule to bind to a G-CSF receptor, and possesses a 
25 diminished ability to selectively stimulate neutrophils as compared to non-altered natural or recombinant G- 
CSF. 

For example, it is now known which moieties within the internal regions of the G-CSF molecule are 
hydrophobic, and. correspondingly, which moieties on the external portion of the G-CSF molecule are 
hydrophilic. Without knowledge of the overall three dimensional structure, preferably to the atomic level as 

30 provided herein, one could not forecast which alterations within this hydrophobic internal area would result 
in a change in the overall structural conformation of the molecule. An overall structural change could result 
in a functional change, such as lack of receptor binding, for example, and therefore, diminishment of 
biological activity as found in non-altered G-CSF. Another class of G-CSF analogs is therefore G-CSF 
analogs which possess the same hydrophobicity as (non-altered) natural or recombinant G-CSF. More 

35 particularly, another class of G-CSF analogs possesses the same hydrophobic moieties within the four 
helical bundle of its internal core as those hydrophobic moieties possessed by (non-altered) natural or 
recombinant G-CSF yet have a composition different from said non-altered natural or recombinant G-CSF. 

Another example relates to external loops which are structures which connect the internal core (helices) 
of the G-CSF molecule. From the three dimensional structure - including information regarding the spatial 

40 location of the amino acid residues -- one may forecast that certain changes in certain loops will not result 
in overall conformational changes. Therefore, another class of G-CSF analogs provided herein is that having 
an altered external loop but possessing the same overall structure as (non-altered) natural or recombinant 
G-CSF. More particularly, another class of G-CSF analogs provided herein are those having an altered 
external loop, said loop being selected from the loop present between helices A and B; between helices B 

45 and C; between helices C and D; between helices D and A, as those loops and helices are identified herein. 
More particularly, said loops, preferably the AB loop and/or the CD loop are altered to increase the half life 
of the molecule by stabilizing said loops. Such stabilization may be by connecting all or a portion of said 
loop(s) to a portion of an alpha helical bundle found in the core of a G-CSF (or analog) molecule. Such 
connection may be via beta sheet, salt bridge, disulfide bonds, hydrophobic interaction or other connecting 

so means available to those skilled in the art, wherein such connecting means serves to stabilize said external 
loop or loops. For example, one may stabilize the AB or CD loops by connecting the AB loop to one of the 
helices within the internal region of the molecule. 

The N-terminus also may be altered without change in the overall structure of a G-CSF molecule, 
because the N-terminus does not effect structural stability of the internal helices, and, although the external 

55 loops are preferred for modification, the same general statements apply to the N-terminus. 

Additionally, such external loops may be the site(s) for chemical modification because in (non-altered) 
natural or recombinant G-CSF such loops are relatively flexible and tend not to interfere with receptor 
binding. Thus, there would be additional room for a chemical moiety to be directly attached (or indirectly 
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attached via another chemical moiety which serves as a chemical connecting means). The chemical moiety 
may be selected from a variety of moieties available for modification of one or more function of a G-CSF 
molecule. For example, an external loop may provide sites for the addition of one or more polymer which 
serves to increase serum half-life, such as a polyethylene glycol molecule. Such polyethylene glycol 

5 molecule(s) may be added wherein said loop is altered to include additional lysines which have reactive 
side groups to which polyethylene glycol moieties are capable of attaching. Other classes of chemical 
moieties may also be attached to one or more external loops, including but not limited to other biologically 
active molecules, such as receptors, other therapeutic proteins (such as other hematopoietic factors which 
would engender a hybrid molecule), or cytotoxic agents (such as diphtheria toxin). This list is of course not 

jo. complete; one skilled in the art possessed of the desired chemical moiety will have the means to effect 
attachment of said desired moiety to the desired external loop. Therefore, another class of the present G- 
CSF analogs includes those with at least one alteration in an external loop wherein said alteration provides 
for the addition of a chemical moiety such as at least one polyethylene glycol molecule. 

Deletions, such as deletions of sites recognized by proteins for degradation of the molecule, may also 

»5 be effectual in the external loops. This provides alternative means for increasing half-life of a molecule 
otherwise having the G-CSF receptor binding and signal transduction capabilities (i.e.. the ability to 
selectively stimulate the maturation of neutrophils). Therefore, another class of the present G-CSF analogs 
includes those with at least one alteration in an external loop wherein said alteration decreases the turnover 
of said analog by proteases. Preferred loops for such alterations are the AB loop and the CD loop. One may 

20 prepare an abbreviated G-CSF molecule by deleting a portion of the amino acid residues found in the 
external loops (identified in more detail below), said abbreviated G-CSF molecule may have additional 
advantages in preparation or in biological function. 

Another example relates to the relative charges between amino acid residues which are in proximity to 
each other. As noted above, the G-CSF molecule contains a relatively tightly packed four helical bundle. 

25 Some of the faces on the helices face other helices. At the point (such as a residue) where a helix faces 
another helix, the two amino acid moieties which face each other may have the same charge, and thus tend 
to repel each other, which lends instability to the overall molecule. This may be eliminated by changing the 
charge (to an opposite charge or a neutral charge) of one or both of the amino acid moieties so that there is 
no repelling. Therefore, another class of G-CSF analogs includes those G-CSF analogs having been altered 

30 to modify instability due to surface interactions, such as electron charge location. 

In another aspect, the present invention relates to methods for designing G-CSF analogs and related 
compositions and the products of those methods. The end products of the methods may be the G-CSF 
analogs as defined above or related compositions. For instance, the examples disclosed herein demonstrate 
(a) the effects of changes in the constituents (i.e., chemical moieties) of the G-CSF molecule on the G-CSF 

35 structure and (b) the effects of changes in structure on biological function. Essentially, therefore, another 
aspect of the present invention is a method for preparing a G-CSF analog comprising the steps of: 

(a) viewing information conveying the three dimensional structure of a G-CSF molecule wherein the 
chemical moieties, such as each amino acid residue or each atom of each amino acid residue, of the G- 
CSF molecule are correlated with said structure; 

40 (b) selecting from said information a site on a G-CSF molecule for alteration; 

(c) preparing a G-CSF analog molecule having such alteration; and 

(d) optionally, testing such G-CSF analog molecule for a desired characteristic. 

One may use the here provided computer programs for a computer-based method for preparing a G- 
CSF analog. Another aspect of the present invention is therefore a computer based method for preparing a 
45 G-CSF analog comprising the steps of: 

(a) providing computer expression of the three dimensional structure of a G-CSF molecule wherein the 
chemical moieties, such as each amino acid residue or each atom of each amino acid residue, of the G- 
CSF molecule are correlated with said structure; 

(b) selecting from said computer expression a site on a G-CSF molecule for alteration; 
so (c) preparing a G-CSF molecule having such alteration; and 

(d) optionally, testing such G-CSF molecule for a desired characteristic. 
More specifically, the present invention provides a method for preparing a G-CSF analog comprising 
the steps of: 

(a) viewing the three dimensional structure of a G-CSF molecule via a computer, said computer 
55 programmed (i) to express the coordinates of a G-CSF molecule in three dimensional space, and (ii) to 

allow for entry of information for alteration of said G-CSF expression and viewing thereof; 

(b) selecting a site on said visual image of said G-CSF molecule for alteration; 

(c) entering information for said alteration on said computer; 
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(d) viewing a three dimensional structure of said altered G-CSF molecule via said computer; 

(e) optionally repeating steps (a)-(e); 

(f) preparing a G-CSF analog with said alteration; and 

(g) optionally testing said G-CSF analog for a desired characteristic. 

5 In another aspect, the present invention relates to methods of using the present G-CSF analogs and 
related compositions and methods for the treatment or protection of mammals, either alone or in 
combination with other hematopoietic factors or drugs in the treatment of hematopoietic disorders. It is 
contemplated that one aspect of designing G-CSF analogs will be the goal of enhancing or modifying the 
characteristics non-modified G-CSF is known to have. 

w For example, the present analogs may possess enhanced or modified activities, so, where G-CSF is 
. useful in the treatment of (for example) neutropenia, the present compositions and methods may also be of 
such use. 

Another example is the modification of G-CSF for the purpose of interacting more effectively when used 
in combination with other factors particularly in the treatment of hematopoietic disorders. One example of 

»5 such combination use is to use an early-acting hematopoietic factor (i.e., a factor which acts earlier in the 
hematopoiesis cascade on relatively undifferentiated cells) and either simultaneously or in seriatim use of a 
later-acting hematopoietic factor, such as G-CSF or analog thereof (as G-CSF acts on the CFU-GM lineage 
in the selective stimulation of neutrophils). The present methods and compositions may be useful in therapy 
involving such combinations or "cocktails" of hematopoietic factors. 

20 The present compositions and methods may also be useful in the treatment of leukopenia, mylogenous 
leukemia, severe chronic neutropenia, aplastic anemia, glycogen storage disease, mucosistitis, and other 
. bone marrow failure states. The present compositions and methods may also be useful in the treatment of 
hematopoietic deficits arising from chemotherapy or from radiation therapy. The success of bone marrow 
transplantation, or the use of peripheral blood progenitor cells for transplantation, for example, may be 

25 enhanced by application of the present compositions (proteins or nucleic acids for gene therapy) and 
methods. The present compositions and methods may also be useful in the treatment of infectious 
diseases, such in the context of wound healing, burn treatment, bacteremia, septicemia, fungal infections, 
endocarditis, osteopyelitis, infection related to abdominal trauma, infections not responding to antibiotics, 
pneumonia and the treatment of bacterial inflammation may also benefit from the application of the present 

30 compositions and methods. In addition, the present compositions and methods may be useful in the 
treatment of leukemia based upon a reported ability to differentiate leukemic cells. Welte et al., PNAS-USA 
82: 1526-1530 (1985). Other applications include the treatment of individuals with tumors, using the present 
compositions and methods, optionally in the presence of receptors (such as antibodies) which bind to the 
tumor cells. For review articles on therapeutic applications, see Lieshhke and Burgess, N.Engl.J.Med. 327 : 

35 28-34 and 99-106 (1992) both of which are herein incorporated by reference. 

The present compositions and methods may also be useful to act as intermediaries in the production of 
other moieties; for example, G-CSF has been reported to influence the production of other hematopoietic 
factors and this function (if ascertained) may be enhanced or modified via the present compositions and/or 
methods. 

40 The compositions related to the present G-CSF analogs, such as receptors, may be useful to act as an 
antagonist which prevents the activity of G-CSF or an analog. One may obtain a composition with some or 
all of the activity of non-altered G-CSF or a G-CSF analog, and add one or more chemical moieties to alter 
one or more properties of such G-CSF or analog. With knowledge of the three dimensional conformation, 
one may forecast the best geographic location for such chemical modification to achieve the desired effect. 

45 General objectives in chemical modification may include improved half-life (such as reduced renal, 
immunological or cellular clearance), altered bioactivity (such as altered enzymatic properties, dissociated 
bioactivities or activity in organic solvents), reduced toxicity (such as concealing toxic epitopes, compart- 
mentalization, and selective biodistribution), altered immunoreactivity (reduced immunogenicity, reduced 
antigenicity or adjuvant action), or altered physical properties (such as increased solubility, improved 

so thermal stability, improved mechanical stability, or conformational stabilization). See Francis, Focus on 
Growth Factors 3: 4-10 (May 1992) (published by Mediscript, Mountview Court, Friern Barnet Lane, 
London N20 OLD, UK). 

The examples below are illustrative of the present invention and are not intended as a limitation. It is 
understood that variations and modifications will occur to those skilled in the art, and it is intended that the 
55 appended claims cover all such equivalent variations which come within the scope of the invention as 
claimed. 
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Detailed Description of the Drawings 

FIGURE 1 is an illustration of the amino acid sequence of the 174 amino acid species of G-CSF with an 
additional N-terminal methionine (Seq. ID No.: 1) (Seq. ID No.: 2). 
5 FIGURE 2 is an topology diagram of the crystalline structure of G-CSF, as well as hGH, pGH, GM-CSF, 
INF-B, IL-2, and IL-4. These illustrations are based on inspection of cited references. The length of 
secondary structural elements are drawn in proportion to the number of residues. A, B, C, and D helices are 
labeled according to the scheme used herein for G-CSF. For INF-0. the original labeling of helices is 
indicated in parentheses. 

io FIGURE 3 is an "ribbon diagram" of the three dimensional structure of G-CSF. Helix A is amino acid 
residues 11-39 (numbered according to Figure 1, above), helix B is amino acid residues 72-91, helix C is 
amino acid residues 100-123, and helix D is amino acid residues 143-173. The relatively short 3 10 helix is at 
amino acid residues 45-48, and the alpha helix is at amino acid residues 48-53. Residues 93-95 form 
almost one turn of a left handed helix. 

15 FIGURE 4 is a "barrel diagram" of the three dimensional structure of G-CSF. Shown in various shades 
of gray are the overall cylinders and their orientations for the three dimensional structure of G-CSF. The 
numbers indicate amino acid residue position according to FIGURE 1 above. 

FIGURE 5 is a list of the coordinates used to generate a computer-aided visual image of the three- 
dimensional structure of G-CSF. The coordinates are set forth below. The columns correspond to separate 

20 field: 

(i) Field 1 (from the left hand side) is the atom, 

(ii) Field 2 is the assigned atom number, 

(iii) Field 3 is the atom name (according to the periodic table standard nomenclature, with CB being 
carbon atom Beta, CG is Carbon atom Gamma, etc.); 

25 (iv) Field 4 is the residue type (according to three letter nomenclature for amino acids as found in, e.g ., 
Stryer, Biochemistry, 3d Ed., W.H. Freeman and Company, N.Y. 1988, inside back cover); 

(v) Fields 5-7 are the x-axis, y-axis and z-axis positions of the atom; 

(vi) Field 8 (often a "1.00") designates occupancy at that position; 

(vii) Field 9 designates the B-factor; 

30 (viii) Field 10 designates the molecule designation. Three molecules (designated a, b, and c) of G-CSF 
crystallized together as a unit. The designation a, b, or c indicates which coordinates are from which 
molecule. The number after the letter (1 , 2, or 3) indicates the assigned amino acid residue position, with 
molecule A having assigned positions 10-175, molecule B having assigned positions 210-375, and 
molecule C having assigned positions 410-575. These positions were so designated so that there would 

35 be no overlap among the three molecules which crystallized together. (The "W" designation indicates 
water). 

FIGURE 6 is a schematic representation of the strategy involved in refining the crystallization matrix for 
parameters involved in crystallization. The crystallization matrix corresponds to the final concentration of the 
components (salts, buffers and precipitants) of the crystallization solutions in the wells of a 24 well tissue 

40 culture plate. These concentrations are produced by pipetting the appropriate volume of stock solutions into 
the wells of the microtiter plate. To design the matrix, the crystallographer decides on an upper and lower 
concentration of the component. These upper and lower concentrations can be pipetted along either the 
rows (e.g., A1-A6, B1-B6, C1-C6 or D1-D6) or along the entire tray (A1-D6). The former method is useful for 
checking reproducibility of crystal growth of a single component along a limited number of wells, whereas 

45 the later method is more useful in initial screening. The results of several stages of refinement of the 
crystallization matrix are illustrated by a representation of three plates. The increase in shading in the wells 
indicates a positive crystallization result which, in the final stages, would be X-ray quality crystals but in the 
initial stages could be oil droplets, granular precipitates or small crystals approximately less than 0.05 mm 
in size. Part A represents an initial screen of one parameter in which the range of concentration between the 

so first well (A1) and last well (D6) is large and the concentration increase between wells is calculated as (- 
(concentration A1 )-(concentration D6))/23). Part B represents that in later stages of the crystallization matrix 
refinement of the concentration spread between A1 and D6 would be reduced which would result in more 
crystals formed per plate. Part C indicates a final stage of matrix refinement in which quality crystals are 
found in most wells of the plate. 

55 
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Detailed Description of the Invention 

The present invention grows out of the discovery of the three dimensional structure of G-CSF. This 
three dimensional structure has been expressed via computer program for stereoscopic viewing. By viewing 
5 this stereoscopically, structure-function relationships identified and G-CSF analogs have been designed and 
made. 

The Overall Three Dimensional Structure of G-CSF 

io The G-CSF used to ascertain the structure was a non-glycosylated 174 amino acid species having an 
extra N-terminal methionine residue incident to bacterial expression. The DNA and amino acid sequence of 
this G-CSF are illustrated in FIGURE 1. 

Overall, the three dimensional structure of G-CSF is predominantly helical, with 103 of the 175 residues 
forming a 4-alpha-helical bundle. The only other secondary structure is found in the loop between the first 

75 two long helices where a 4 residue 3'° helix is immediately followed by a 6 residue alpha helix. As shown in 
FIGURE 2, the overall structure has been compared with the structure reported for other proteins: growth 
hormone (Abdel-Meguid et al., PNAS-USA 84: 6434 (1987) and Vos et al.. Science 255: 305-312 (1992)), 
granulocyte macrophage colony stimulating factor (Diederichs et at., Science 254: 1779-1782 (1991), 
interferon-^ (Senda et al., EMBO J. 11: 3193-3201 (1992)), interleukin-2 (McKay Science 257: 1673-1677 

so (1992)) and interleukin-4 (Powers et al.. Science 256: 1673-1677 (1992), and Smith et al., J. Mol. Biol. 224: 
899-904 (1992)). Structural similarity among these growth factors occurs despite the absence of similarity in 
their amino acid sequences. 

Presently, the structural information was correlation of G-CSF biochemistry, and this can be summa- 
rized as follows (with sequence position 1 being at the N-terminus): 



Sequence Position 


Description of Structure 


Analysis 


1-10 


Extended chain 


Deletion causes no loss of biological 






activity 


Cys 18 


Partially buried 


Reactive with DTNB and 






Thimersososl but not with 






iodo-acetate 


34 


Alternative splice site 


Insertion reduces biological activity 


20-47 (inclusive) 


Helix A, first disulfide and portion of AB helix 


Predicted receptor binding region 






based on neutralizing antibody data 


20, 23, 24 


Helix A 


Single alanine mutation of residue(s) 






reduces biological activity. Predicted 






receptor binding (Site B). 


165-175 (inclusive) 


Carboxy terminus 


Deletion reduces biological activity 



This biochemical information, having been gleaned from antibody binding studies, see Layton et al., 
Biochemistry 266: 23815-23823 (1991). was superimposed on the three-dimensional structure in order to 
design G-CSF analogs. The design, preparation, and testing of these G-CSF analogs is described in 
^ Example 1 below. 

EXAMPLE 1 

This Example describes the preparation of crystalline G-CSF, the visualization of the three dimensional 
structure of recombinant human G-CSF via computer-generated image, the preparation of analogs, using 
site-directed mutagenesis or nucleic acid amplification methods, the biological assays and HPLC analysis 
used to analyze the G-CSF analogs, and the resulting determination of overall structure/function relation- 
ships. All cited publications are herein incorporated by reference. 

ss A. Use of Automated Crystallization 

The need for a three-dimensional structure of recombinant human granulocyte colony stimulating factor 
(r-hu-G-CSF), and the availability of large quantities of the purified protein, led to methods of crystal growth 
by incomplete factorial sampling and seeding. Starting with the implementation of incomplete factorial 
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crystallization described by Jancarik and Kim' J. Appl. Crystallogr. 24: 409 (1991) solution conditions that 
yielded oil droplets and birefringence aggregates were ascertained. Also, software and hardware of an 
automated pipetting system were modified to produce some 400 different crystallization conditions per day. 
Weber, J. Appl. Crystallogr. 20: 366-373 (1987). This procedure led to a crystallization solution which 
5 produced r-hu-G-CSF crystals. 

The size, reproducibility and quality of the crystals was improved by a seeding method in which the 
number of "nucleation initiating units" was estimated by serial dilution of a seeding solution. These methods 
yielded reproducible growth of 2.0 mm r-hu-G-CSF crystals. The space group of these crystals is P2,2i2i 
with cell dimensions of a = 90 A, b = 110 A and c = 49 A, and they diffract to a resolution of 2.0 A. 

10 

1. Overall Methodology 

To search for the crystallizing conditions of a new protein. Carter and Carter, J. Biol. Chem. 254 : 
122219-12223 (1979) proposed the incomplete factorial method. They suggested that a sampling of a large 

is number of randomly selected, but generally probable, crystallizing conditions may lead to a successful 
combination of reagents that produce protein crystallization. This idea was implemented by Jancarik and 
Kim, J. Appl. Crystallogr. 24: 409(1991), who described 32 solutions for the initial crystallization trials which 
cover a range of pH, salts and precipitants. Here we describe an extension of their implementation to an 
expanded set of 70 solutions. To minimize the human effort and error of solution preparation, the method 

20 has been programmed for an automatic pipetting machine. 

Following Weber's method of successive automated grid searching (SAGS), J.Cryst. Growth 90: 318- 
324(1988). the robotic system was used to generate a series of solutions which continually refined the 
crystallization conditions of temperature, pH, salts and precipitant. Once a solution that could reproducibly 
grow crystals was determined, a seeding technique which greatly improved the quality of the crystals was 

25 developed. When these methods were combined, hundreds of diffraction quality crystals (crystals diffracting 
to at least about 2.5 Angstroms, preferably having at least portions diffracting to below 2 Angstroms, and 
more preferably, approximately 1 Angstrom) were produced in a few days. 

Generally, the method for crystallization, which may be used with any protein one desires to crystallize, 
comprises the steps of: 

30 (a) combining aqueous aliquots of the desired protein with either (i) aliquots of a salt solution, each 
aliquot having a different concentration of salt; or (ii) aliquots of a precipitant solution, each aliquot having 
a different concentration of precipitant, optionally wherein each combined aliquot is combined in the 
presence of a range of pH; 

(b) observing said combined aliquots for precrystalline formations, and selecting said salt or precipitant 
35 combination and said pH which is efficacious in producing precrystalline forms, or, if no precrystalline 

forms are so produced, increasing the protein starting concentration of said aqueous aliquots of protein; 

(c) after said salt or said precipitant concentration is selected, repeating step (a) with said previously 
unselected solution in the presence of said selected concentration; and 

(d) repeating step (b) and step (a) until a crystal of desired quality is obtained. 

40 The above method may optionally be automated, which provides vast savings in time and labor. 
Preferred protein starting concentrations are between 10mg/ml and 20mg/ml, however this starting con- 
centration will vary with the protein (the G-CSF below was analyzed using 33mg/ml). A preferred range of 
salt solution to begin analysis with is (NaCI) of 0-2.5M. A preferred precipitant is polyethylene glycol 8000, 
however, other precipitants include organic solvents (such as ethanol), polyethylene glycol molecules 

45 having a molecular weight in the range of 500-20,000, and other precipitants known to those skilled in the 
art. The preferred pH range is pH 4.5 , 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, and 9.0. Precrystallization forms 
include oils, birefringement precipitants. small crystals (< approximately 0.05 mm), medium crystals 
(approximately 0.5 to .5 mm) and large crystals (> approximately 0.5 mm). The preferred time for waiting to 
see a crystalline structure is 48 hours, although weekly observation is also preferred, and generally, after 

so about one month, a different protein concentration is utilized (generally the protein concentration is 
increased). Automation is preferred, using the Accuflex system as modified. The preferred automation 
parameters are described below. 

Generally, protein with a concentration between 1 0 mg/ml and 20 mg/ml was combined with a range of 
NaCI solutions from 0-2.5 M, and each such combination was performed (separately) in the presence of the 

55 above range of concentrations. Once a precrystallization structure is observed, that salt concentration and 
pH range are optimized in a separate experiment, until the desired crystal quality is achieved. Next, the 
precipitant concentration, in the presence of varying levels of pH is also optimized. When both are 
optimized, the optimal conditions are performed at once to achieve the desired result (this is diagrammed in 
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FIGURE 6). 

a. Implementation of an automated pipetting system 

5 Drops and reservoir solutions were prepared by an Accuflex pipetting system (ICN Pharmaceuticals, 
Costa Mesa. CA) which is controlled by a personal computer that sends ASCII codes through a standard 
serial interface. The pipetter samples six different solutions by means of a rotating valve and pipettes these 
solutions onto a plate whose translation in a x-y coordinate system can be controlled. The vertical 
component of the system manipulates a syringe that is capable both of dispensing and retrieving liquid. 

w The software provided with the Accuflex was based on the SAGS method as proposed by Cox and 
. Weber, J.Appl. Crystallogr. 20: 366-373 (1987). This method involves the systematic variation of two major 
crystallization parameters, pH and precipitant concentration, with provision to vary two others. While 
building on these concepts, the software used here provided greater flexibility in the design and im- 
plementation of the crystallization solutions used in the automated grid searching strategy. As a result of 

;5 this flexibility the present software also created a larger number of different solutions. This is essential for 
the implementation of the incomplete factorial method as described in that section below. 

To improve the speed and design of the automated grid searching strategy, the Accuflex pipetting 
system required software and hardware modifications. The hardware changes allowed the use of two 
different micro-titer trays, one used for handing drop and one used for sitting drop experiments, and a 

20 Plexiglas tray which held 24 additional buffer, salt and precipitant solutions. These additional solutions 
expanded the grid of crystallizing conditions that could be surveyed. 

To utilize the hardware modifications, the pipetting software was written in two subroutines: one 
subroutine allows the crystallographer to design a matrix of crystallization solutions based on the concentra- 
tions of their components and the second subroutine to translate these concentrations into the computer 

25 code which pipettes the proper volumes of the solutions into the crystallization trays. The concentration 
matrices can be generated by either of two programs. The first program (MRF, available from Amgen, Inc., 
Thousand Oaks, CA) refers to a list of stock solution concentrations supplied by the crystallographer and 
calculates the required volume to be pipette to achieve the designated concentration. The second method, 
which is preferred, incorporates a spread sheet program (Lotus ) which can be used to make more 

30 sophisticated gradients of precipitants or pH. The concentration matrix created by either program is 
interpreted by the control program (SUX, a modification of the program found in the Accuflex pipetter 
originally and available from Amgen, Inc., Thousand Oaks, CA) and the wells are filled accordingly. 

b. Implementation of the Incomplete Factorial Method 

35 

The convenience of the modified pipetting system for preparing diverse solutions improved the 
implementation of an expanded incomplete factorial method. The development of a new set of crystalliza- 
tion solutions having "random" components was generated using the program INFAC, Carter et at., J.Cryst. 
Growth 90: 60-73(1988) which produced a list containing 96 random combinations of one factor from three 
40 variables. Combinations of calcium and phosphate which immediately precipitated were eliminated, leaving 
70 distinct combinations of precipitants, salts and buffers. These combinations were prepared using the 
automated pipetter and incubated for 1 week. The mixtures were inspected and solutions which formed 
precipitants were prepared again with lower concentrations of their components. This was repeated until all 
wells were clear of precipitant. 

45 

c. Crystallization of r-hu-G-CSF 

Several different crystallization strategies were used to find a solution which produced x-ray quality 
crystals. These strategies included the use of the incomplete factorial method, refinement of the crystalliza- 

50 tion conditions using successive automated grid searches (SAGS), implementation of a seeding technique 
and development of a crystal production procedure which yielded hundreds of quality crystals overnight. 
Unless otherwise noted the screening and production of r-hu-G-CSF crystals utilized the hanging drop vapor 
diffusion method. Afinsen et al., Physical principles of protein crystallization. In: Eisenberg (ed.), Advances 
in Protein Chemistry 41: 1-33 (1991). 

55 The initial screening for crystallization conditions of r-hu-G-CSF used the Jancarik and Kim, 
J.Appl.Crystallogr. 24: 409(1991) incomplete factorial method which resulted in several solutions that 
produced "precrystallization" results. These results included birefringent precipitants, oils and very small 
crystals (< .05 mm). These precrystallizations solutions then served as the starting points for systematic 
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screening. 

The screening process required the development of crystallization matrices. These matrices cor- 
responded to the concentration of the components in the crystallization solutions and were created using 
the IBM-PC based spread sheet Lotus™ and implemented with the modified Accuflex pipetting system. 

5 The strategy in designing the matrices was to vary one crystallization condition (such as salt concentration) 
while holding the other conditions such as pH, and precipitant concentration constant. At the start of 
screening, the concentration range of the varied condition was large but the concentration was successively 
refined until all wells in the micro-titer tray produced the same crystallization result. These results were 
scored as follows: crystals, birefringement precipitate, granular precipitate, oil droplets and amorphous 

10. mass. If the concentration of a crystallization parameter did not produce at least a precipitant, the 

. concentration of that parameter was increased until a precipitant formed. After each tray was produced, it 
was left undisturbed for at least two days and then inspected for crystal growth. After this initial screening, 
the trays were then inspected on a weekly basis. 

From this screening process, two independent solutions with the same pH and precipitant but differing 

;s in salts (MgCI, LiSCU) were identified which produced small (0.1 x 0.05 x 0.05 mm) crystals. Based on 
these results, a new series of concentration matrices were produced which varied MgCI with respect to 
LiSOt while keeping the other crystallization parameters constant. This series of experiments resulted in 
identification of a solution which produced diffraction quality crystals (> approximately 0.5 mm) in about 
three weeks. To find this crystallization growth solution (100 mM Mes pH 5.8, 380 mM MgCfe. 220 mM 

20 LiS04 and 8% PEG 8k) approximately 8,000 conditions had been screened which consumed about 300 mg 
of protein. 

The size of the crystals depended on the number of crystals forming per drop. Typically 3 to 5 crystals 
would be formed with average size of (1 .0 x 0.7 x 0.7 mm). Two morphologies which had an identical space 
group (P2i2 t 2i) and unit cell dimensions a = 90.2, b = 110.2, c = 49.5 were obtained depending on whether 
25 or not seeding (see below) was implemented. Without seeding, the r-hu-G-CSF crystals had one long flat 
surface and rounded edges. 

When seeding was employed, crystals with sharp faces were observed in the drop within 4 to 6 hours 
(0.05 by 0.05 by 0.05 mm). Within 24 hours, crystals had grown to (0.7 by 0.7 by 0.7 mm) and continued to 
grow beyond 2 mm depending on the number of crystals forming in the drop. 

30 

d. Seeding and determination of nucleation initiation sites . 

The presently provided method for seeding crystals establishes the number of nucleation initiation units 
in each individual well used (here, after the optimum conditions for growing crystals had been determined). 

35 The method here is advantageous in that the number of "seeds" affects the quality of the crystals, and this 
in turn affects the degree of resolution. The present seeding here also provides advantages in that with 
seeding, G-CSF crystal grows in a period of about 3 days, whereas without seeding, the growth takes 
approximately three weeks. 

In one series of production growth (see methods), showers of small but well defined crystals were 

40 produced overnight (<0.01 x 0.01 x0.01 mm). Crystallization conditions were followed as described above 
except that a pipette tip employed in previously had been reused. Presumably, the crystal showering effect 
was caused by small nucleation units which had formed in the used tip and which provided sites of 
nucleation for the crystals. Addition of a small amount (0.5 ul) of the drops containing the crystal showers to 
a new drop under standard production growth conditions resulted in a shower of crystals overnight. This 

45 method was used to produce several trays of drops containing crystal showers which we termed "seed 
stock". 

The number of nucleation initiation units (NIU) contained within the "seed stock" drops was estimated 
to attempt to improve the reproducibility and quality of the r-hu-GCSF crystals. To determine the number of 
NIU in the "seed stock", an aliquot of the drop was serially diluted along a 96 well microtiter plate. The 

so microtiter plate was prepared by adding 50 ul of a solution containing equal volumes of r-hu-G-CSF (33 
mg/ml) and the crystal growth solution (described above) in each well. An aliquot (3 ul) of one of the "seed 
stock" drops was transferred to the first well of the microtiter plate. The solution in the well was mixed and 
3 ul was then transferred to the next well along the row of the microtiter plate. Each row of the microtiter 
plate was similarly prepared and the tray was sealed with plastic tape. Overnight, small crystals formed in 

55 the bottom of the wells of the microtiter plate and the number of crystals in the wells were correlated to the 
dilution of the original "seed stock". To produce large single crystals, the "seed stock" drop was 
appropriately diluted into fresh CGS and then an aliquot of this solution containing the NIU was transferred 
to a drop ° 
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Once crystallization conditions had been optimized, crystals were grown in a production method in 
which 3 ml each of CGS and r-hu-G-CSF (33 mg/ml) were mixed to create 5 trays (each having 24 wells). 
This method included the production of the refined crystallization solution in liter quantities, mixing this 
solution with protein and placing the protein/crystallization solution in either hanging drop or sitting drop 
5 trays. This process typically yielded 100 to 300 quality crystals (>0.5 mm) in about 5 days. 

e. Experimental Methods 

Materials 

10 

Crystallographic information was obtained starting with r-hu-met-G-CSF with the amino acid sequence 
as provided in FIGURE 1 with a specific activity of 1.0 +/- 0.6 x lO^/mg (as measured by cell mitogenesis 
assay in a 10 mM acetate buffer at pH 4.0 (in Water for Injection) at a concentration of approximately 3 
mg/ml solution was concentrated with an Amicon concentrator at 75 psi using a YM10 filter. The solution 
75 was typically concentrated 10 fold at 4*C and stored for several months. 

Initial Screening 

Crystals suitable for X-ray analysis were obtained by vapor-diffusion equilibrium using hanging drops. 

20 For preliminary screening, 7 ul of the protein solution at 33 mg/ml (as prepared above) was mixed with an 
equal volume of the well solution, placed on siliconized glass plates and suspended over the well solution 
utilizing Linbro tissue culture plates (Flow Laboratories, McLean, Va). All of the pipetting was performed with 
the Accuflex pipetter, however, trays were removed from the automated pipetter after the well solutions had 
been created and thoroughly mixed for at least 10 minutes with a table top shaker. The Linbro trays were 

2s then returned to the pipetter which added the well and protein solutions to the siliconized cover slips. The 
cover slips were then inverted and sealed over 1 ml of the well solutions with silicon grease. 

The components of the automated crystallization system are as follows. A PC-DOS computer system 
was used to design a matrix of crystallization solutions based on the concentration of their components. 
These matrices were produced with either MRF of the Lotus spread sheet (described above). The final 

30 product of these programs is a data file. This file contains the information required by the SUX program to 
pipette the appropriate volume of the stock solutions to obtain the concentrations described in the matrices. 
The SUX program information was passed through a serial I/O port and used to dictate to the Accuflex 
pipetting system the position of the valve relative to the stock solutions, the amount of solution to be 
retrieved, and then pipetted into the wells of the microtiter plates and the X-Y position of each well (the 

35 column/row of each well). Addition information was transmitted to the pipetter which included the Z position 
(height) of the syringe during filling as well as the position of a drain where the system pauses to purge the 
syringe between fillings of different solutions. The 24 well microtiter plate (either Linbro or Cryschem) and 
cover slip holder was placed on a plate which was moved in the X-Y plane. Movement of the plate allowed 
the pipetter to position the syringe to pipette into the wells. It also positioned the coverslips and vials and 

40 extract solutions from these sources. Prior the pipetting, the Linbro microtiter plates had a thin film of 
grease applied around the edges of the wells. After the crystallization solutions were prepared in the wells 
and before they were transferred to the cover slips, the microtiter plate was removed from the pipetting 
system, and solutions were allowed to mix on a table top shaker for ten minutes. After mixing, the well 
solution was either transferred to the cover slips (in the case of the hanging drop protocol) or transferred to 

45 the middle post in the well (in the case of the sitting drop protocol). Protein was extracted from a vial and 
added to the coverslip drop containing the well solution (or to the post). Plastic tape was applied to the top 
of the Cryschem plate to seal the wells. 

Production Growth 

Once conditions for crystallization had been optimized, crystal growth was performed utilizing a 
"production" method. The crystallization solution which contained 100 mM Mes pH 5.8, 380 mM MgCI2, 
220 mM LiS04, and 8% PEG 8K was made in 1 liter quantities. Utilizing an Eppindorf syringe pipetter. 1 ml 
aliquots of this solution were pipetted into each of the wells of the Linbro plate. A solution containing 50% of 
55 this solution and 50% G-CSF (33 mg/ml) was mixed and pipetted onto the siliconized cover slips. Typical 
volumes of these drops were between 50 and 100 ul and because of the large size of these drops, great 
care was taken in flipping the coverslips and suspending the drops over the wells. 
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Data Collection 

The structure has been refined with X-PLOR (Bruniger, X-PLOR version 3.0, A system for crystallog- 
raphy and NMR, Yale University, New Haven CT) against 2.2A data collected on an R-AXIS (Molecular 
5 Structure. Corp. Houston, TX) imaging plate detector. 

f. Observations 

As an effective recombinant human therapeutic, r-hu-G-CSF has been produced in large quantities and 
w. gram levels have been made available for structural analysis. The crystallization methods provided herein 
are likely to find other applications as other proteins of interest become available. This method can be 
applied to any crystallographic project which has large quantities of protein (approximately >200 mg). As 
one skilled in the art will recognize, the present materials and methods may be modified and equivalent 
materials and methods may be available for crystallization of other proteins. 

15 

B. Computer Program For Visualizing The Three Dimensional Structure of G-CSF 

Although diagrams, such as those in the Figures herein, are useful for visualizing the three dimensional 
structure of G-CSF, a computer program which allows for stereoscopic viewing of the molecule is 
20 contemplated as preferred. This stereoscopic viewing, or "virtual reality" as those in the art sometimes refer 
to it, allows one to visualize the structure in its three dimensional form from every angle in a wide range of 
resolution, from macromolecular structure down to the atomic level. The computer programs contemplated 
herein also allow one to change perspective of the viewing angle of the molecule, for example by rotating 
the molecule. The contemplated programs also respond to changes so that one may, for example, delete, 
25 add, or substitute one or more images of atoms, including entire amino acid residues, or add chemical 
moieties to existing or substituted groups, and visualize the change in structure. 

Other computer based systems may be used; the elements being: (a) a means for entering information, 
such as orthogonal coordinates or other numerically assigned coordinates of the three dimensional structure 
of G-CSF; (b) a means for expressing such coordinates, such as visual means so that one may view the 
30 three dimensional structure and correlate such three dimensional structure with the composition of the G- 
CSF molecule, such as the amino acid composition; (c) optionally, means for entering information which 
alters the composition of the G-CSF molecule expressed, so that the image of such three dimensional 
structure displays the altered composition. 

The coordinates for the preferred computer program used are presented in FIGURE 5. The preferred 
35 computer program is Insight II, version 4, available from Biosym in San Diego, CA. For the raw 
crystallographic structure, the observed intensities of the diffraction data ("F-obs") and the orthogonal 
coordinates are also deposited in the Protein Data Bank, Chemistry Department, Brookhaven National 
Laboratory, Upton, New York 119723, USA and these are herein incorporated by reference. 

Once the coordinates are entered into the Insight II program, one can easily display the three 
40 dimensional G-CSF molecule representation on a computer screen. The preferred computer system for 
display is Silicon Graphics 320 VGX (San Diego, CA). For stereoscopic viewing, one may wear eyewear 
(Crystal Eyes, Silicon Graphics) which allows one to visualize the G-CSF molecule in three dimensions 
stereoscopically, so one may turn the molecule and envision molecular design. 

Thus, the present invention provides a method of designing or preparing a G-CSF analog with the aid of 
45 a computer comprising: 

(a) providing said computer with the means for displaying the three dimensional structure of a G-CSF 
molecule including displaying the composition of moieties of said G-CSF molecule, preferably displaying 
the three dimensional location of each amino acid, and more preferably displaying the three dimensional 
location of each atom of a G-CSF molecule; 
so (b) viewing said display; 

(c) selecting a site on said display for alteration in the composition of said molecule or the location of a 
moiety; and 

(d) preparing a G-CSF analog with such alteration. 

The alteration may be selected based on the desired structural characteristics of the end-product G- 
55 CSF analog, and considerations for such design are described in more detail below. Such considerations 
include the location and compositions of hydrophobic amino acid residues, particularly residues internal to 
the helical structures of a G-CSF molecule which residues, when altered, alter the overall structure of the 
internal core of the molecule and may prevent receptor binding; the location and compositions of external 
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loop structures, alteration of which may not affect the overall structure of the G-CSF molecule. 

FIGURES 2-4 illustrate the overall three dimensional conformation in different ways. The topological 
diagram, the ribbon diagram, and the barrel diagram all illustrate aspects of the conformation of G-CSF. 
FIGURE 2 illustrates a comparison between G-CSF and other molecules. There is a similarity of 
5 architecture, although these growth factors differ in the local conformations of their loops and bundle 
geometries. The up-up-down-down topology with two long crossover connections is conserved, however, 
among all six of these molecules, despite the dissimilarity in amino acid sequence. 

FIGURE 3 illustrates in more detail the secondary structure of recombinant human G-CSF. This ribbon 
diagram illustrates the handedness of the helices and their positions relative to each other. 
10 FIGURE 4 illustrates in a different way the conformation of recombinant human G-CSF. This "barrel" 
m diagram illustrates the overall architecture of recombinant human G-CSF. 

C. Preparation of Analogs Using M13 Mutagenesis 

J5 This example relates to the preparation of G-CSF analogs using site directed mutagenesis techniques 
involving the single stranded bacteriophage M13, according to methods published in PCT Application No. 
WO 85/00817 (Souza et al., published February 28, 1985. herein incorporated by reference). This method 
essentially involves using a single-stranded nucleic acid template of the non-mutagenized sequence, and 
binding to it a smaller oligonucleotide containing the desired change in the sequence. Hybridization 

20 conditions allow for non-identical sequences to hybridize and the remaining sequence is filled in to be 
identical to the original template. What results is a double stranded molecule, with one of the two strands 
containing the desired change. This mutagenized single strand is separated, and used itself as a template 
for its complementary strand. This creates a double stranded molecule with the desired change. 

The original G-CSF nucleic acid sequence used is presented in FIGURE 1 , and the oligonucleotides 

25 containing the mutagenized nucleic acid(s) are presented in Table 2. Abbreviations used herein for amino 
acid residues and nucleotides are conventional, see Stryer, Biochemistry, 3d Ed.. W.H. Freeman and 
Company, N.Y., N.Y. 1988. inside back cover. 

The original G-CSF nucleic acid sequence was first placed into vector M1 3mp21 . The DNA from single 
stranded phage M13mp21 containing the original G-CSF sequence was then isolated, and resuspended in 

30 water. For each reaction, 200 ng of this DNA was mixed with a 1.5 pmole of phosphorylated oligonucleotide 
(Table 2) and suspended in 0.1 M Tris, 0.01 M MgCI 2 , 0.005M DTT, 0.1 mM ATP, pH 8.0. The DNAs were 
annealed by heating to 65 • C and slowly cooling to room temperature. 

Once cooled, 0.5mM of each ATP, dATP, dCTP, dGTP. TTP. 1 unit of T4 DNA ligase and 1 unit of 
Klenow fragment of E. coli polymerase 1 were added to the 1 unit of annealed DNA in 0.1 M Tris. 0.025M 

35 NaCI, 0.01 M MgCI 2 , 0.01 M DTT, pH 7.5. 

The now double stranded, closed circular DNA was used to transfect E. coli without further purification. 
Plaques were screened by lifting the plaques with nitrocellulose filters, and then hybridizing the filters with 
single stranded DNA end-labeled with P 32 for 1 hour at 55-60 *C. After hybridization, the filters were washed 
at 0-3 *C below the melt temperature of the oligo (2*C for A-T. 4'C for G-C) which selectively left 

40 autoradiography signals corresponding to plaques with phage containing the mutated sequence. Positive 
clones were confirmed by sequencing. 

Set forth below are the oligonucleotides used for each G-CSF analog prepared via the M13 
mutagenesis method. The nomenclature indicates the residue and the position of the original amino acid 
(e.g.. Lysine at position 17), and the residue and position of the substituted amino acid (e.g., arginine 17). A 

45 substitution involving more than one residue is indicated via superscript notation, with commas between the 
noted positions or a semicolon indicating different residues. Deletions with no substitutions are so noted. 
The oligonucleotide sequences used for M13-based mutagenesis are next indicated; these oligonucleotides 
were manufactured synthetically, although the method of preparation is not critical, any nucleic acid 
synthesis method and/or equipment may be used. The length of the oligo is also indicated. As indicated 

so above, these oligos were allowed to contact the single stranded phage vector, and then single nucleotides 
were added to complete the G-CSF analog nucleic acid sequence. 
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ss D. Preparation of G-CSF Analogs Using DNA Amplification 

This example relates to methods for producing G-CSF analogs using a DNA amplification technique. 
Essentially, DNA encoding each analog was amplified in two separate pieces, combined, and then the total 
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sequence itself amplified. Depending upon where the desired change in the original G-CSF DNA was to be 
made, internal primers were used to incorporate the change, and generate the two separate amplified 
pieces. For example, for amplification of the 5' end of the desired analog DNA, a 5' flanking primer 
(complementary to a sequence of the plasmid upstream from the G-CSF original DNA) was used at one end 

s of the region to be amplified, and an internal primer, capable of hybridizing to the original DNA but 
incorporating the desired change, was used for priming the other end. The resulting amplified region 
stretched from the 5' flanking primer through the internal primer. The same was done for the 3' terminus, 
using a 3' flanking primer (complementary to a sequence of the plasmid downstream from the G-CSF 
original DNA) and an internal primer complementary to the region of the intended mutation. Once the two 

io. "halves" (which may or may not be equal in size, depending on the location of the internal primer) were 
amplified; the two "halves" were allowed to connect. Once connected, the 5' flanking primer and the 3' 
flanking primer were used to amplify the entire sequence containing the desired change. 

If more than one change is desired, the above process may be modified to incorporate the change into 
the internal primer, or the process may be repeated using a different internal primer. Alternatively, the gene 

75 amplification process may be used with other methods for creating changes in nucleic acid sequence, such 
as the phage based mutagenesis technique as described above. Examples of process for preparing analogs 
with more than one change are described below. 

To create the G-CSF analogs described below, the template DNA used was the sequence as in 
FIGURE 1 plus certain flanking regions (from a plasmid containing the G-CSF coding region). These 

20 flanking regions were used as the 5' and 3" flanking primers and are set forth below. The amplification 
reactions were performed in 40 ul volumes containing 10 mM Tris-HCI, 1.5 mM MgCb, 50 mM KCI, 0.1 
mg/ml gelatin, pH 8.3 at 20*C. The 40 ul reactions also contained 0.1mM of each dNTP. 10 pmoles of each 
primer, and 1 ng of template DNA. Each amplification was repeated for 15 cycles. Each cycle consisted of 
0.5 minutes at 94 'C, 0.5 minutes at 50 *C, and 0.75 minutes at 72 'C. Flanking primers were 20 

25 nucleotides in length and internal primers were 20 to 25 nucleotides in length. This resulted in multiple 
copies of double stranded DNA encoding either the front portion or the back portion of the desired G-CSF 
analog. 

For combining the two "halves," one fortieth of each of the two reactions was combined in a third DNA 
amplification reaction. The two portions were allowed to anneal at the internal primer location, as their ends 

so bearing the mutation were complementary, and following a cycle of polymerization, give rise to a full length 
DNA sequence. Once so annealed, the whole analog was amplified using the 5' and 3' flanking primers. 
This amplification process was repeated for 15 cycles as described above. 

The completed, amplified analog DNA sequence was cleaved with Xbal and Xhol restriction en- 
donuclease to produce cohesive ends for insertion into a vector. The cleaved DNA was placed into a 

35 plasmid vector, and that vector was used to transform E. coji. Transformants were challenged with 
kanamycin at 50 ug/ml and incubated at 30 *C. Production of G-CSF analog protein was confirmed by 
polyacrylamide gel electrophoresis of a whole cell lysate. The presence of the desired mutation was 
confirmed by DNA sequence analysis of plasmid purified from the production isolate. Cultures were then 
grown, and cells were harvested, and the G-CSF analogs were purified as set forth below. 

40 Set forth below in Table 3 are the specific primers used for eachanalog made using gene amplification. 
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Table 3 



Analog Seq. ID 


Internal Primer(5 , ->3') 




His* 4 -> Ala 4 4 


5'primer-TTCCGGAGCGCACAGTTTG 
3'primer-CAAACTGTGGGCTCCGGAAGAGC 


49 


Thr 117 ->Ala 117 


5'primer-ATGCCAAATTGCAGTAGCAAAG 
3'primer-CTTTGCTACTGCAATTTGGCAACA 


52 


Asp 11 °->Ala 110 


5'primer-ATCAGCTACTGCTAGCTGCAGA 
3'primer-TCTGCAGCTAGCAGTAGCTGACT 


53 
54 


Gln 2, ->Ala 21 


5'primer-TTACGAACCGCTTCCAGACATT 
3'primer-AATGTCTGGAAGCGGTTCGTAAAAT 


55 
56 


Asp" 3 ->Ala 113 


5'primer-GTAGCAAATGCAGCTACATCTA 
3'primer-TAGATGTAGCTGCATTTGCTACTAC 


57 
58 


His"->Ala 53 


5'primer-CCAAGAGAAGCACCCAGCAG 
3'primer-CTGCTGGGTGCTTCTCTTGGGA 


59 
60 


For each analog, 


the following 5' flanking primer was used: 




5'-CACTGGCGGTGATAATGAGC | 61 


For each analog, the following 3' flanking primer was used: 




3'-GGTCATTACGGACCGGATC 


62 



1 . Construction of Double Mutation 

To make G-CSF analog Gln 12 ' 21 ->Glu' 2,21 , two separate DNA amplifications were conducted to create 
the two DNA mutations. The template DNA used was the sequence as in FIGURE 1 plus certain flanking 
regions (from a plasmid containing the G-CSF coding region). The precise sequences are listed below. 
Each of the two DNA amplification reactions were carried out using a Perkin Elmer/Cetus DNA Thermal 
Cycler. The 40 ul reaction mix consisted of 1X PCR Buffer (Cetus), 0.2 mM each of the 4 dXTPs (Cetus), 
50 pmoles of each primer oligonucleotide. 2 ng of G-CSF template DNA (on a plasmid vector), and 1 unit of 
Taq polymerase (Cetus). The amplification process was carried out for 30 cycles. Each cycle consisted of 
1 minute at 94 'C, 2 minutes at 50 'C, and 3 minutes at 72 "C. 

DNA amplification "A" used the oligonucleotides: 
5' CCACTGGCGGTGATACTGAGC 3' (Seq. ID 63) and 
5' AGCAGAAAGCTTTCCGGCAGAGAAGAAGCAGGA 3' (Seq. ID 64) 

DNA amplification "B" used the oligonucleotides: 5' GCCGCAAAGCTTTCTGCTGAAATGTCTG- 
GAAGAGGTTCGTAAAATCCAGGGTGA 3' (Seq. ID 65) and 

5" CTGGAATGCAGAAGCAAATGCCGGCATAGCACCTTCAGTCGGTTGCAGAGCTGGTGCCA 3" (Seq. ID 
66) 

From the 109 base pair double stranded DNA product obtained after DNA amplification "A", a 64 base 
pair Xbal to Hindlll DNA fragment was cut and isolated that contained the DNA mutation Gln 12 ->Glu 12 . From 
the 509 base pair double stranded DNA product obtained after DNA amplification "B", a 197 base pair 
Hindlll to Bsml DNA fragment was cut and isolated that contained the DNA mutation Gln^Glu 21 . 

The "A" and "B" fragments were ligated together with a 4.8 kilo-base pair Xbal to Bsml DNA plasmid 
vector fragment. The ligation mix consisted of equal molar DNA restriction fragments, ligation buffer (25 mM 
Tris-HCI pH 7.8, 10 mM MgCb, 2 mM DTT, 0.5 mM rATP, and 100 ug/ml BSA) and T4 DNA ligase and was 
incubated overnight at 14 *C. The ligated DNA was then transformed into E. coli FM5 cells by elec- 
troporation using a Bio Rad Gene Pulsar apparatus (BioRad, Richmond, CA). A clone was isolated and the 
plasmid construct verified to contain the two mutations by DNA sequencing. This "intermediate* vector also 
contained a deletion of a 193 base pair Bsml to Bsml DNA fragment. The final plasmid vector was 
constructed by ligation and transformation (as described above) of DNA fragments obtained by cutting and 
isolating a 2 kilo-base pair Sstl to BamHI DNA fragment from the intermediate vector, a 2.8 kbp Sstl to 
EcoRI DNA fragment from the plasmid vector, and a 360 bp BamHI to EcoRI DNA fragment from the 
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plasmid vector. The final construct was verified by DNA sequencing the G-CSF gene. Cultures were grown, 
and the cells were harvested, and the G-CSF analogs were purified as set forth below. 

As indicated above, any combination of mutagenesis techniques may be used to generate a G-CSF 
analog nucleic acid (and expression product) having one or more than one alteration. The two examples 
s above, using Mi3-based mutagenesis and gene amplification-based mutagenesis, are illustrative. 

E. Expression of G-CSF Analog DNA 

The G-CSF analog DNAs were then placed into a plasmid vector and used to transform E. coli strain 
»o. FM5 (ATCC#5391 1). The present G-CSF analog DNAs contained on plasmids and in bacterial host cells are 
available from the American Type Culture Collection, Rockville, MD, and the accession designations are 
indicated below. 

One liter cultures were grown in broth containing 10g tryptone, 5g yeast extract and 5g NaCI) at 30 *C 
until reaching a density at A 600 of 0.5, at which point they were rapidly heated to 42 'C. The flasks were 
is allowed to continue shaking at for three hours. 

Other prokaryotic or eukaryotic host cells may also be used, such as other bacterial cells, strains or 
species, mammalian cells in culture (COS, CHO or other types) insect cells or multicellular organs or 
organisms, or plant cells or multicellular organs or organisms, and a skilled practitioner will recognize the 
appropriate host. The present G-CSF analogs and related compositions may also be prepared synthetically, 
20 as, for example, by solid phase peptide synthesis methds, or other chemical manufacturing techniques. 
Other cloning and expression systems will be apparent to those skilled in the art. 

F. Purification of G-CSF Analog Protein 

25 Cells were harvested by centrifugation (10,000 x G, 20 minutes, 4'C). The pellet (usually 5 grams) was 
resuspended in 30 ml of 1mM DTT and passed three times through a French press cell at 10,000 psi. The 
broken cell suspension was centrifuged at 10,000g for 30 minutes, the supernatant removed, and the pellet 
resuspended in 30-40 ml water. This was recentrifuged at 10,000 x G for 30 minutes, and this pellet was 
dissolved in 25 ml of 2% Sarkosyl and 50mM Tris at pH 8. Copper sulfate was added to a concentration of 

30 40uM, and the mixture was allowed to stir for at least 15 hours at 15-25 'C. The mixture was then 
centrifuged at 20,000 x G for 30 minutes. The resultant solubilized protein mixture was diluted four-fold with 
13.3 mM Tris, pH 7.7, the Sarkosyl was removed, and the supernatant was then applied to a DEAE- 
cellulose (Whatman DE-52) column equilibrated in 20mM Tris, pH 7.7. After loading and washing the 
column with the same buffer, the analogs were eluted with 20mM Tris /NaCI (between 35mM to 100mM 

35 depending on the analog, as indicated below), pH 7.7. For most of the analogs, the eluent from the DEAE 
column was adjusted to a pH of 5.4, with 50% acetic acid and diluted as necessary (to obtain the proper 
conductivity) with 5mM sodium acetate pH 5.4. The solution was then loaded onto a CM-sepharose column 
equilibrated in 20 mM sodium acetate, pH 5.4. The column was then washed with 20mM NaAc. pH 5.4 until 
the absorbance at 280 nm was approximately zero. The G-CSF analog was then eluted with sodium 

40 acetate/NaCI in concentrations as described below in Table 4. The DEAE column eluents for those analogs 
not applied to the CM-sepharose column were dialyzed directly into 10mM NaAc, ph 4.0 buffer. The 
purified G-CSF analogs were then suitably isolated for in vitro analysis. The salt concentrations used for 
eluting the analogs varied, as noted above. Below, the salt concentrations for the DEAE cellulose column 
and for the CM-sepharose column are listed: 
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Table 1 
Salt Concent-rations 

Analog DFAF Cellulose CM-SePharOSe 

Lys 17_ >Arg 17 35mM 37.5mM 

Lys 24_ >Ar g24 35mM 37.5mM 

Lys 35_ >Ar g35 35mM 37.5mM 

L y S 41_ >Ar g41 35mM 37.5mM 

Lys 17,24,35_ 35mM 37.5mM 



>Arg- 



17,24,35 



LyS' 



17, 35,41_ 



>Arg- 



17, 35, 41 
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Table 4 Con't 



Ly S 24, 35, 41- 


HTT^P Pal lnlnea 

35mM 


37 . 5mM 


>Arg24, 35, 41 






, „„17 24 35 41 


35mM 


37 . 5mM 


_ >Arg 17, 24, 35, 41 






Ly S 17, 24, 41_ 




37.5mM 


^« _„17 24 41 
>Ar cf ' * 






Gln^^ - >Glu^^ 


60mM 


37, 5mM 


r>„^37.43 --c„^37 . 43 


40mM 


37 . 5mM 


Gin* D ->Aia*° 


40mM 


40mM 


rinl74.s»i al74 


40mM 


40mM 


Argl70_>Aial^0 


mM 


40mM 


a -r-^1 67_>>a i = 167 


40raM 


40mM 


Deletion 10/ 


N/A 


N/A 




1 60mM 


4 0mM 


uj .4 4 — ST we- 44 

rilS^ L«y S 


40mM 


60mM 


Glu^ ^— >Ala^ 


40mM 


4 0mM 


t-«23-si i a 2 3 

AXy fc * >Ala*" v 


40mM 


4 0mM 


Lys^ ^— >Ala^4 


120mM 


4 0mM 


ri n20_>>a 1 = 20 


40mM 


60mM 


len28-sil =28 
ASp fc ""/Ala* 




80mM 


Met^27_>Qi u 127 


8oIm 




Metl38_> Glu 138 


80mM 


40mM 


Met 127_ >Leu 127 


40mM 


40mM 


Met 138_ >Leu 138 


40mM 


40mM 


C y S 18-> A ial8 


40mM 


37.5mM 


Gln 12,21_ >Glu 12,21 


60mM 


37.5mM 


Gln 12, 21,68- 


60mM 


37.5mM 


>Glu 12,21,68 






Glu 20-> Ala 20 ; 






Serl3 






->Gly 13 


40mM 


80raM 
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Table 4 Con't 
Analog DFAF Ollnlo.ie CM-SftPharOSe 



Met 127,138_ 40mM 40mM 
>Leu 127,13B 

io Serl3->Ai a 13 40mM 40mM 

Lys 17 ->Ala 17 80mM 40mM 

Gln 121 ->Ala 121 40mM 60mM 

J5 Gln 21 ->Ala 21 50mM Gradient 0 -150mM 

His 44 ->Ala 44 ** 40mM N/A 

His53-> A la53** 50mM N/A 

Asp 110- >Ala 110** 40mM N/A 

20 Asp 113 ->Ala 113 ** 40mM N/A 

Thr 117 ->Ala 117 ** 50mM N/A 

Asp 28 ->Ala 28 ; 50mM N/A 

25 Asp 110 

AlallO** 

Glu 124_> A i a 124** 40mM 40mM 



* For Deletion 167 , the data are unavailable. 

** For these analogs, the DEAE cellulose column alone 

was use for purification. 

35 The above purification methods are illustrative, and a skilled practitioner will recognize that other means 
are available for obtaining the present G-CSF analogs. 

G. Biological Assays 

40 Regardless of which methods were used to create the present G-CSF analogs, the analogs were 
subject to assays for biological activity. Tritiated thymidine assays were conducted to ascertain the degree 
of cell division. Other biological assays, however, may be used to ascertain the desired activity. Biological 
assays such as assaying for the ability to induce terminal differentiation in mouse WEHI-3B (D + ) leukemic 
cell line, also provides indication of G-CSF activity. See Nicola, et al.. Blood 54: 614-27 (1979). Other in 

45 vitro assays may be used to ascertain biological activity. See Nicola, Annu. Rev. Biochem. 58: 45-77 (19891! 
In general, the test for biological activity should provide analysis for the desired result, such as increase or 
decrease in biological activity (as compared to non-altered G-CSF), different biological activity (as com- 
pared to non-altered G-CSF), receptor affinity analysis, or serum half-life analysis. The list is incomplete, 
and those skilled in the art will recognize other assays useful for testing for the desired end result. 

so The 3 H-thymidine assay was performed using standard methods. Bone marrow was obtained from 
sacrificed female Balb C mice. Bone marrow cells were briefly suspended, centrifuged, and resuspended in 
a growth medium. A 160 ul aliquot containing approximately 10,000 cells was placed into each well of a 96 
well micro-titer plate. Samples of the purified G-CSF analog(as prepared above) were added to each well, 
and incubated for 68 hours. Tritiated thymidine was added to the wells and allowed to incubate for 5 

55 additional hours. After the 5 hour incubation time, the cells were harvested, filtered, and thoroughly rinsed. 
The filters were added to a vial containing scintillation fluid. The beta emissions were counted (LKB 
Betaplate scintillation counter). Standards and analogs were analyzed in triplicate, and samples which fell 
substantially above or below the standard curve were re-assayed with the proper dilution. The results 
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reported here are the average of the triplicate analog data relative to the unaltered recombinant human G- 
CSF standard results. 

H. HPLC Analysis 

5 

High pressure liquid chromatography was performed on purified samples of analog. Although peak 
position on a reverse phase HPLC column is not a definitive indication of structural similarity between two 
proteins, analogs which have similar retention times may have the same type of hydrophobic interactions 
with the HPLC column as the non-altered molecule. This is one indication of an overall similar structure. 

10 Samples of the analog and the non-altered recombinant human G-CSF were analyzed on a reverse 
phase (0.46 x 25 cm) Vydac 214TP54 column (Separations Group, Inc. Hesperia, CA). The purified analog 
G-CSF samples were prepared in 20 mM acetate and 40 mM NaCI solution buffered at pH 5.2 to a final 
concentration of 0.1 mg/ml to 5 mg/ml, depending on how the analog performed in the column. Varying 
amounts (depending on the concentration) were loaded onto the HPLC column, which had been equilibrated 

»5 with an aqueous solution containing 1% isopropanol, 52.8% acetonitrile, and .38% trifluoro acetate (TFA). 
The samples were subjected to a gradient of 0.86%/minute acetonitrile, and .002% TFA. 

I. Results 

20 Presented below are the results of the above biological assays and HPLC analysis. Biological activity is 
the average of triplicate data and reported as a percentage of the control standard (non-altered G-CSF). 
Relative HPLC peak position is the position of the analog G-CSF relative to the control standard (non-altered 
G-CSF) peak. The " + " or "-" symbols indicate whether the analog HPLC peak was in advance of or 
followed the control standard peak (in minutes). Not all of the variants had been analyzed for relative HPLC 

25 peak, and only those so analyzed are included below. Also presented are the American Type Culture 
Collection designations for E. coli host cells containing the nucleic acids coding for the present analogs, as 
prepared above. 
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1. Identification of Structure-Function Relationships 

The first step used to design the present analogs was to determine what moieties are necessary for 
structural integrity of the G-CSF molecule. This was done at the amino acid residue level, although the 
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atomic level is also available for analysis. Modification of the residues necessary for structural integrity 
results in change in the overall structure of the G-CSF molecule. This may or may not be desirable, 
depending on the analog one wishes to produce. The working examples here were designed to maintain the 
overall structural integrity of the G-CSF molecule, for the purpose of maintain G-CSF receptor binding of the 
5 analog to the G-CSF receptor (as used in this section below, the "G-CSF receptor" refers to the natural G- 
CSF receptor, found on hematopoietic cells). It was assumed, and confirmed by the studies presented here, 
that G-CSF receptor binding is a necessary step for at least one biological activity, as determined by the 
above biological assays. 

As can be seen from the figures, G-CSF (here, recombinant human met-G-CSF) is an antiparallel 4- 
io alpha helical bundle with a left-handed twist, and with overall dimensions of 45 A x 30A x 24k. The four 
. helices within the bundle are referred to as helices A, B, C and D, and their connecting loops are known as 
the AB, BC and CD loops. The helix crossing angles range from -167.5' to -159.4*. Helices A, B, and C 
are straight, whereas helix D contains two kinds of structural characteristics, at Gly 150 and Ser 160 (of the 
recombinant human met-G-CSF). Overall, the G-CSF molecules is a bundle of four helices, connected in 
rs series by external loops. This structural information was then correlated with known functional information. It 
was known that residues (including methionine at position 1) 47, 23, 24, 20. 21, 44, 53, 113, 110. 28 and 
1 14 may be modified, and the effect on biological activity would be substantial. 

The majority of single mutations which lowered biological activity were centered around two regions of 
G-CSF that are separated by 30A, and are located on different faces of the four helix bundle. One region 
20 involves interactions between the A helix and the D helix. This is further confirmed by the presence of salt 
bridges in the non-altered molecule as follows: 
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Atom 
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Distance 
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Tyr 166 OH 
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3.3 
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Arg 23 N1 
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Arg 23 N1 


A 


Gin 26 OE1 


A 


3.1 


Gin 159 NE2 


D 


Gin 26 O 


A 


3.3 



Distances reported here were for molecule A, as indicated in FIGURE 5 (wherein three G-CSF 
molecules crystallized together and were designated as A, B, and C). As can be seen, there is a web of salt 
bridges between helix A and helix D, which act to stabilize the helix A structure, and therefore affect the 
as overall structure of the G-CSF molecule. 

The area centering around residues Glu 20, Arg 23 and Lys 24 are found on the hydrophilic face of the 
A helix (residues 20-37). Substitution of the residues with the non-charged alanine residue at positions 20 
and 23 resulted in similar HPLC retention times, indicating similarity in structure. Alteration of these sites 
altered the biological activity (as indicated by the present assays). Substitution at Lys 24 altered biological 
activity, but did not result in a similar HPLC retention time as the other two alterations. 

The second site at which alteration lowered biological activity involves the AB helix. Changing glutamine 
at position 47 to alanine (analog no. 19, above) reduced biological activity (in the thymidine uptake assay) to 
zero. The AB helix is predominantly hydrophobic, except at the amino and carboxy termini: it contains one 
turn of a 3 10 helix. There are two histadines at each termini (His 44 and His 56) and an additional glutamate 
^ at residue 46 which has the potential to form a salt bridge to His 44. The fourier transformed infra red 
spectrographic analysis (FTIR) of the analog suggests this analog is structurally similar to the non-altered 
recombinant G-CSF molecule. Further testing showed that this analog would not crystallize under the same 
conditions as the non-altered recombinant molecule. 

Alterations at the carboxy terminus (Gin 174, Arg 167 and Arg 170) had little effect on biological activity. 
so In contrast, deletion of the last eight residues (167-175) lowered biological activity. These results may 
indicate that the deletion destabilizes the overall structure which prevents the mutant from proper binding to 
the G-CSF receptor (and thus initiating signal transduction). 

Generally, for the G-CSF internal core - the internal four helix bundle lacking the external loops -the 
hydrophobic internal residues are essential for structural integrity. For example, in helix A, the internal 
ss hydrophobic residues are (with methionine being position 1) Phe 14, Cys 18, Val 22, He 25. lie 32 and Leu 
36. Generally, for the G-CSF internal core - the internal four helix bundle lacking the external loops -the 
hydrophobic internal residues are essential for structural integrity. For example, in helix A, the internal 
hydrophobic residues are (with methionine being position 1 as in FIGURE 1) Phe 14. Cys 18, Val 22, lie 25, 
lie 32 and Leu 36. The other hydrophobic residues (again with the met at position 1 ) are: helix B. Ala 72, 
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Leu 76, Leu 79, Leu 83, Tyr 86, Leu 90 Leu 93; helix C, Leu 104, Leu 107, Val 111, Ala 114, He 118, Met 
122; and helix D, Val 154. Val 158, Phe 161. Val 164, Val 168, Leu 172. 

The above biological activity data, from the presently prepared G-CSF analogs, demonstrate that 
modification of the external loops interfere least with G-CSF overall structure. Preferred loops for analog 

s prepration are the AB loop and the CD loop. The loops are relatively flexible structures as compared to the 
helices. The loops may contribute to the proteolysis of the molecule. G-CSF is relatively fast acting in vivo 
as the purpose the molecule serves is to generate a response to a biological challenge, i.e., selectively 
stimulate neutrophils. The G-CSF turnover rate is also relatively fast. The flexibility of the loops may provide 
a "handle" for proteases to attach to the molecule to inactivate the molecule. Modification of the loops to 

jo prevent protease degradation, yet have (via retention of the overall structure of non-modified G-CSF) no 
loss in biological activity may be accomplished. 

This phenomenon is probably not limited to the G-CSF molecule but may also be common to the other 
molecules with known similar overall structures, as presented in Figure 2. Alteration of the external loop of, 
for example hGH, Interferon B, IL-2, GM-CSF and IL-4 may provide the least change to the overall structure. 

75 The external loops on the GM-CSF molecule are not as flexible as those found on the G-CSF molecule, and 
this may indicate a longer serum life, consistent with the broader biological activity of GM-CSF. Thus, the 
external loops of GM-CSF may be modified by releasing the external loops from the beta-sheet structure, 
which may make the loops more flexible (similar to those G-CSF) and therefore make the molecule more 
susceptible to protease degradation (and thus increase the turnover rate). 

20 Alteration of these external loops may be effected by stabilizing the loops by connection to one or more 
of the internal helices. Connecting means are known to those in the art, such as the formation of a beta 
sheet, salt bridge, disulfide bonding or hydrophobic interactions, and other means are available. Also, 
deletion of one or more moieties, such as one or more amino acid residues or portions thereof, to prepare 
an abbreviated molecule and thus eliminate certain portions of the external loops may be effected. 

25 Thus, by alteration of the external loops, preferably the AB loop (amino acids 58-72 of r-hu-met G-CSF) 
or the CD loop (amino acids 119 to 145 of r-hu-met-G-CSF), and less preferably the amino terminus (amino 
acids 1-10), one may therefore modify the biological function without elimination of G-CSF G-CSF receptor 
binding. For example, one may: (1) increase half-life (or prepare an oral dosage form, for example) of the G- 
CSF molecule by, for example, decreasing the ability of proteases to act on the G-CSF molecule or adding 

30 chemical modifications to the G-CSF molecule, such as one or more polyethylene glycol molecules or 
enteric coatings for oral formulation which would act to change some characteristic of the G-CSF molecule 
as described above, such as increasing serum or other half-life or decreasing antigenicity; (2) prepare a 
hybrid molecule, such as combining G-CSF with part or all of another protein such as another cytokine or 
another protein which effects signal transduction via entry through the cell through a G-CSF G-CSF receptor 

35 transport mechanism; or (3) increase the biological activity as in, for example, the ability to selectively 
stimulate neutrophils (as compared to a non-modified G-CSF molecule). This list is not limited to the above 
exemplars. 

Another aspect observed from the above data is that stabilizing surface interactions may affect 
biological activity. This is apparent from comparing analogs 23 and 40. Analog 23 contains a substitution of 

40 the charged asparagine residue at position 28 for the neutrally-charged alanine residue in that position, and 
such substitution resulted in a 50% increase in the biological activity (as measured by the disclosed 
thymidine uptake assays). The asparagine residue at position 28 has a surface interaction with the 
asparagine residue at position 113; both residues being negatively charged, there is a certain amount of 
instability (due to the repelling of like charged moieties). When, however the asparagine at position 113 is 

45 replaced with the neutrally-charged alanine, the biological activity drops to zero (in the present assay 
system). This indicates that the asparagine at position 113 is critical to biological activity, and elimination of 
the asparagine at position 28 serves to increase the effect that asparagine at position 113 possesses. 

The domains required for G-CSF receptor binding were also determined based on the above analogs 
prepared and the G-CSF structure. The G-CSF receptor binding domain is located at residues (with 

so methionine being position 1) 11-57 (between the A and AB helix) and 100-118 (between the B and C 
helices). One may also prepare abbreviated molecules capable of binding to a G-CSF receptor and initiate 
signal transduction for selectively stimulating neutrophils by changing the external loop structure and having 
the receptor binding domains remain intact. 

Residues essential for biological activity and presumably G-CSF receptor binding or signal transduction 

55 have been identified. Two distinct sites are located on two different regions of the secondary structure. 
What is here called "Site A" is located on a helix which is constrained by salt bridge contacts between two 
other members of the helical bundle. The second site, "Site B" is located on a relatively more flexible helix. 
AB. The AB helix is potentially more sensitive to local pH changes because of the type and position of the 
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residues at the carboxy and amino termini. The functional importance of this flexible helix may be important 
in a conformationally induced fit when binding to the G-CSF receptor. Additionally, the extended portion of 
the D helix is also indicated to be a G-CSF receptor binding domain, as ascertained by direct mutational 
and indirect comparative protein structure analysis. Deletion of the carboxy terminal end of r-hu-met-G-CSF 

5 reduces activity as it does for hGH, see, Cunningham and Wells, Science 244: 1081-1084 (1989). Cytokines 
which have similar structures, such as IL-6 and GM-CSF with predicted similar topology also center their 
biological activity along the carboxy end of the D helix, see Bazan, Immunology Today 1_t: 350-354 (1990) 

A comparison of the structures and the positions of G-CSF receptor binding determinants between G- 
CSF and hGH suggests both molecules have similar means of signal transduction. Two separate G-CSF 

10 receptor binding sites have been identified for hGH De Vos et al.. Science 255: 306-32 (1991). One of these 
. binding sites (called "Site I") is formed by residues on the exposed faces of hGH's helix 1, the connection 
region between helix 1 and 2, and helix 4. The second binding site (called "Site II") is formed by surface 
residues of helix 1 and helix 3. 

The G-CSF receptor binding determinates identified for G-CSF are located in the same relative 

is positions as those identified for hGH. The G-CSF receptor binding site located in the connecting region 
between helix A and B on the AB helix (Site A) is similar in position to that reported for a small piece of 
helix (residues 38-47) of hGH. A single point mutation in the AB helix of G-CSF significantly reduces 
biological activity (as ascertained in the present assays), indicating the role in a G-CSF receptor-ligand 
interface. Binding of the G-CSF receptor may destabilize the 3 10 helical nature of this region and induce a 

20 conformation change improving the binding energy of the ligand/G-CSF receptor complex. 

In the hGH receptor complex, the first helix of the bundle donates residues to both of the binding sites 
required to dimerize the hGH receptor Mutational analysis of the corresponding helix of G-CSF (helix A) has 
identified three residues which are required for biological activity. Of these three residues, Glu 20 and Arg 
24 lie on one face of the helical bundle towards helix C, whereas the side chain of Arg 23 (in two of the 

ss three molecules in the asymmetric unit) points to the face of the bundle towards helix D. The position of 
side chains of these biologically important residues indicates that similar to hGH, G-CSF may have a 
second G-CSF receptor binding site along the interface between helix A and helix C. In contrast with the 
hGH molecule, the amino terminus of G-CSF has a limited biological role as deletion of the first 1 1 residues 
has little effect on the biological activity. 

30 As indicated above (see FIGURE 2, for example), G-CSF has a topological similarity with other 
cytokines. A correlation of the structure with previous biochemical studies, mutational analysis and direct 
comparison of specific residues of the hGH receptor complex indicates that G-CSF has two receptor 
binding sites. Site A lies along the interface of the A and D helices and includes residues in the small AB 
helix. Site B also includes residues in the A helix but lies along the interface between helices A and C. The 

35 conservation of structure and relative positions of biologically important residues between G-CSF and hGH 
is one indication of a common method of signal transduction in that the receptor is bound in two places. It is 
therefore found that G-CSF analogs possessing altered G-CSF receptor binding domains may be prepared 
by alteration at either of the G-CSF receptor binding sites (residues 20-57 and 145-175). 

Knowledge of the three dimensional structure and correlation of the composition of G-CSF protein 

w makes possible a systematic, rational method for preparing G-CSF analogs. The above working examples 
have demonstrated that the limitations of the size and polarity of the side chains within the core of the 
structure dictate how much change the molecule can tolerate before the overall structure is changed. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: Amgen Inc. 
(ii) TITLE OF INVENTION: 

(iii) NUMBER OF SEQUENCES: 110 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Amgen Inc. 

(B) STREET: Amgen Center, 1840 DeHavilland Drive 

(C) CITY: Thousand Oaks 

(D) STATE: California 

(E) COUNTRY : United States of America 

(F) ZIP: 91320-1789 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

<C) OPERATING SYSTEM: PC -DOS /MS -DOS 

(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 565 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

(A> NAME /KEY : CDS 

(B) LOCATION: 30. .554 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

TCTAGAAAAA ACCAAGGAGG TAATAAATA ATG ACT CCA TTA GGT CCT GCT TCT 
Met Thr Pro Leu Gly Pro Ala Ser 
1 5 

TCT CTG CCG CAA AGC TTT CTG CTG AAA TGT CTG GAA CAG GTT CGT AAA 
Ser Leu Pro Gin Ser Phe Leu Leu Lys Cys Leu Glu Gin Val Arg Lys 
10 is 20 

ATC CAG GGT GAC GGT GCT GCA CTG CAA GAA AAA CTG TGC GCT ACT TAC 
lie Gin Gly Asp Gly Ala Ala Leu Gin Glu Lys Leu Cys Ala Thr Tyr 
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AAA CTG TGC CAT CCG GAA GAG CTG GTA CTG CTG GGT CAT TCT CTT GGG 
Lys Leu Cys His Pro Glu Glu Leu Val Leu Leu Gly His Ser Leu Gly 



GCT GGT TGT CTG TCT CAA CTG CAT TCT GGT CTG TTC CTG TAT CAG GGT 
Ala Gly Cys Leu Ser Gin Leu His Ser Gly Leu Phe Leu Tyr Gin Gly 
75 80 85 

CTT CTG CAA GCT CTG GAA GGT ATC TCT CCG GAA CTG GGT CCG ACT CTG 
Leu Leu Gin Ala Leu Glu Gly He Ser Pro Glu Leu Gly Pro Thr Leu 
90 95 100 

GAC ACT CTG CAG CTA GAT GTA GCT GAC TTT GCT ACT ACT ATT TGG CAA 
Asp Thr Leu Gin Leu Asp Val Ala Asp Phe Ala Thr Thr He Trp Gin 
105 110 115 120 

CAG ATG GAA GAG CTC GGT ATG GCA CCA GCT CTG CAA CCG ACT CAA GGT 
Gin Mec Glu Glu Leu Gly Met Ala Pro Ala Leu Gin Pro Thr Gin Gly 
125 130 135 

GCT ATG CCG GCA TTC GCT TCT GCA TTC CAG CGT CGT GCA GGA GGT GTA 
Ala Met Pro Ala Phe Ala Ser Ala Phe Gin Arg Arg Ala Gly Gly Val 
140 145 150 

CTG GTT GCT TCT CAT CTG CAA TCT TTC CTG GAA GTA TCT TAC CGT GTT 
Leu Val Ala Ser His Leu Gin Ser Phe Leu Glu Val Ser Tyr Arq Val 
155 160 165 



(2) INFORMATION FOR SEQ ID NO:2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
1 5 10 15 

Lys Cys Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
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Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 



(2) INFORMATION FOR SEQ ID NO:3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:3: 
CTTTCTGCTG CGTTGTCTGG AACA 

(2) INFORMATION FOR SEQ ID N0:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 
ACAGGTTCGT CGTATCCAGG GTG 
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(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
CACTGCAAGA ACGTCTGTGC GCT 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 
CGCTACTTAC CGTCTGTGCC ATC 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
CTTTCTGCTG CGTTGTCTGG AACA 

(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
ACAGGTTCGT CGTATC CAGG GTG 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: 
CACTGCAAGA ACGTCTGTGC GCT 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) sequence description: seq id no: 10: 
c tt t c t uc ts cgttgtctgg aaca 

(2) information for seq id no: 11: 
(i) sequence characteristics: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 11: 
ACAGGTTCGT CGTATC CAGG GTG 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE : DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:12: 
CGCTACTTAC CGTCTGTCCC ATC 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 
CTTTCTGCTG CGTTGTCTGG AACA 

(2) INFORMATION FOR SEQ ID NO: 14: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 



(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
CACTGCAAGA ACGTCTGTGC GCT 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: IS: 
CGCTACTTAC CGTCTGTGCC ATC 
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(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:16: 
ACAGGTTCGT CGTATCCAGG GTG 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
CACTGCAAGA ACGTCTGTGC GCT 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
CGCTACTTAC CGTCTGTGCC ATC 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



41 



EP 0 612 846 A1 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO:19: 
CTTTCTGCTG CGTTGTCTGG AACA 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:20: 
ACAGGTTCGT CGTATCCAGG GTG 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
CACTGCAAGA ACGTCTGTGC GCT 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
CGCTACTTAC CGTCTGTGCC ATC 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
TCTGCTGAAA GCTCTGGAAC AGG 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 24 : 
CTTGTCCATC TGAAGCTCTT CAG 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:25: 
GAAAAACTGT CCGCTACTTA CAAACTGTCC CATCCGG 

(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
TTCGTAAAAT CGCGGGTGAC GG 
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(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
TCATCTGGCT GCGCCGTAAT AG 

(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
CCGTGTTCTG GCTCATCTGG CT 

(2) INFORMATION FOR SEQ ID NO: 29: 
(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH : 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
GAAGTATCTT ACGCTGTTCT GCGT 

(2) INFORMATION FOR SEQ ID NO:30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
GAAGTATCTT ACTAAGTTCT GCGTC 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:31: 
CGCTACTTAC GCACTGTGCC AT 

(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
CAAACTGTGC AAG CCGGAAG AG 

(2) INFORMATION FOR SEQ ID NO:33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 
CATCCGGAAG CACTGGTACT GC 



(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: 
GGAACAGGTT GCTAAAATCC AGG 



(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
GAACAGGTTC GTGCGATCCA GGGTG 



(2) INFORMATION FOR SEQ ID NO:36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
TCCAGGGTGC CGGTGCTGC 
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(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:38: 
AAGAGCTCGG TGAGGCACCA GCT \ 

(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
CTCAAGGTGC TGAGCCGGCA TTC 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
GAGCTCGGTC TGGCACCAGC 

(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 
TCAAGGTGCT CTGCCGGCAT T 

(2) INFORMATION FOR SEQ ID NO:42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: 
TCTGCCGCAA GCCTTTCTGC TGA 

(2) INFORMATION FOR SEQ ID NO:43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 
CTTTCTGCTG GCATGTCTGG AACA 

(2) INFORMATION FOR SEQ ID NO:44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 
CTATTTGGCA AGCGATGGAA GAGC 

(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:45: 
CAGATGGAAG CGCTCGGTAT G 

(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 
GAGCTCGGTC TGGCACCAGC 

(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: 
TCAAGGTGCT CTGCCGGCAT T 

<2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:48: 
GAAATGTCTG GCACAGGTTC GT 
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(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 19 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
TTCCGGAGCG CACAGTTTG 

(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 3 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
CGAGAAGGCC TCGGGTGTCA AAC 



(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID 
ATGCCAAATT GCAGTAGCAA AG 

(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 
ACAACGGTTT AACGTCATCG TTTC 

(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 
ATCAGCTACT GCTAGCTGCA GA 

(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 
TCAGTCGATG ACGATCGACG TCT 

(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:55: 
TTACGAACCG CTTCCAGACA TT 

(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY : linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:S6: 
TAAAATGCTT GGCGAAGGTC TGTAA 

(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE : nucleic acid 

(C) STRAND EDNESS : Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 
GTAGCAAATG CAGCTACATC TA 

(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:58: 
CATCATCGTT TACGTCGATG TAGAT 

(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:59: 
CCAAGAGAAG CACCCAGCAG 



52 



EP 0 612 846 A1 



(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 
AGGGTTCTCT TCGTGGGTCG TC 

(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 
CACTGGCGGT G ATAATG AG C 

(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 
CTAGGCCAGG CATTACTGG 

(2) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 
CCACTGGCGG TGATACTGAG C 



(2) INFORMATION FOR SEQ ID NO: 64: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 3 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 
AGCAGAAAGC TTTCCGGCAG AGAAGAAGCA GGA 



(2) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65: 
GCCGCAAAGC TTTCTGCTGA AATGTCTGGA AGAGGTTCGT AAAATCCAGG GTGA 



(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 59 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 
CTGGAATGCA GAAGCAAATG CCGGCATAGC ACCTTCAGTC GGTTGCAGAG CTGGTGCCA 



(2) INFORMATION FOR SEQ ID NO: 67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:67: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
15 10 15 

Arg Cys Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 go 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu. Leu 
1 5 io is 

Lys Cys Leu Glu Gin Val Arg Arg He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 
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Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 " 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO:69: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPB : protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:69: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
1 5 io 15 

Lys Cys Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 2S 30 

Gin Glu Arg Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 



EP 0 612 846 A1 



Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
1 5 10 15 

Lys Cys Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Arg Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 
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Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:71: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
1 5 10 15 

Arg Cys Leu Glu Gin Val Arg Arg He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Arg Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
"5 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO:72: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:72: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
15 10 15 

Arg Cys Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Arg Leu Cys Ala Thr Tyr Arg Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 73: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:73: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
15 io 15 

Lys Cys Leu Glu Gin Val Arg Arg He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Arg Leu Cys Ala Thr Tyr Arg Leu Cys His Pro Glu Glu Leu 



55 



59 



EP 0 612 846 A1 



Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 110 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
14 5 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165. 170 175 

(2) INFORMATION FOR SEQ ID NO: 74: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
15 10 15 

Arg Cys Leu Glu Gin Val Arg Arg He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Arg Leu Cys Ala Thr Tyr Arg Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 
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Asp Phe Ala Thr Thr lie Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 75: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 5 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:75: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
1 5 io 15 

Arg Cys Leu Glu Gin Val Arg Arg lie Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Arg Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro -Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 HO 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 
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(2) INFORMATION FOR SEQ ID NO: 76: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 

15 10 15 

Lys Cys Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Glu Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 77: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77: 
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Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
15 10 IS 

Lys Cys Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 

20 25 30 

Gin Glu Lys Leu Ser Ala Thr Tyr Lys Leu Ser His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 78: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 78: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
15 10 15 

Lys Cys Leu Glu Gin Val Arg Lys He Ala Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 
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Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly lie 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 ISO 155 leo 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 79: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 79: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
1 5 10 is 

Lys Cys Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 * 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 
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Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Ala Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 80: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 80: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
15 10 15 

Lys Cys Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 13S 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Ala His Leu Ala Gin Pro 
165 170 175 
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(2) INFORMATION FOR SEQ ID NO: 81: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 81: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
15 10 15 

Lys Cys Leu Glu Gin Val Arg Lys lie Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly lie Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly lie 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Ala Val Leu Arg His Leu Ala Gin Pro 
165 " 170 175 

(2) INFORMATION FOR SEQ ID NO: 82: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 174 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 82: 
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Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
1 5 10 15 

Lys Cys Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 i 6 o 

Phe Leu Glu Val Ser Tyr Val Leu Arg His Leu Ala Gin Pro 
165 170 174 

(2) INFORMATION FOR SEQ ID NO: 83: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 83: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
15 10 15 

Lys Cys Leu Glu Gin Val Arg Lys lie Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Ala Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
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Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly lie 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
I 45 ISO 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 84: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 84 : 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
1 5 10 15 

Lys Cys Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys Lys Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 
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Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 85: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 85: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
15 10 15 

Lys Cys Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Ala Leu 
35 40 * 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 
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(2) INFORMATION FOR SEQ ID NO: 86: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 86: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
15 10 15 

Lys Cys Leu Glu Gin Val Ala Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 ' 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 • 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 • 170 175 

(2) INFORMATION FOR SEQ ID NO: 87: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 87: 
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Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
15 10 15 

Lys Cys Leu Glu Gin Val Arg Ala He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 * 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 88: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 88: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
15 io 15 

Lys Cys Leu Ala Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
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Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly lie 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 110 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 89: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE : amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 89: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
1 5 io 15 

Lys Cys Leu Glu Gin Val Arg Lys He Gin Gly Ala Gly Ala Ala Leu 
20 25 " 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 go 

Cys Pro Ser Gin. Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 
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Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 90: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE : amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 90: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
1 5 io 15 

Lys Cys Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 ~ 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Glu Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 
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(2) INFORMATION FOR SEQ ID NO: 91: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 91: 

Mec Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
1 5 10 is 

Lys Cys Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 go 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Glu Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
14 5 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 92: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:92: 
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Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
15 io 15 

Lys Cys Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 ao 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Leu Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
14 5 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 93: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:93: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
15 io 15 

Lys Cys Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
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Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly lie 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr lie Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Leu Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 94: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 94: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
1 5 10 i 5 

Lys Ala Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly lie 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 
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Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 95: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:95: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Glu Ser Phe Leu Leu 
1 5 io 15 

Lys Cys Leu Glu Glu Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 * 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 
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(2) INFORMATION FOR SEQ ID NO: 96: 

(i) SEQUENCE CHARACTERISTICS: 
s (A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

10 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 96: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Glu Ser Phe Leu Leu 
15 10 15 

, s Lys Cys Leu Glu Glu Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 " 45 

20 val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Glu Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

25 Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

30 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
35 130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
40 165 170 175 

(2) INFORMATION FOR SEQ ID NO: 97: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : protein 

so 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 97: 
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Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Gly Phe Leu Leu 
l 5 10 15 

Lys Cys Leu Ala Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 * 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 110 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 98: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:98: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
15 10 15 

Lys Cys Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 



Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 
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Val Leu Leu Gly His Ser Leu Gly lie Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Leu Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Leu Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 



(2) INFORMATION FOR SEQ ID NO: 99: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii> MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 99: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ala Phe Leu Leu 
1 5 10 15 

Lys Cys Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 
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Asp Phe Ala Thr Thr lie Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 



Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 100: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:100: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
15 10 15 

Ala Cys Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 

£ 5 _7_Q 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 
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(2) INFORMATION FOR SEQ ID NO: 101: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 101: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
15 10 15 

Lys Cys Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 ' 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Ala Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO : 102: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 102: 
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Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
15 10 is 

Lys Cys Leu Glu Ala Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 



Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 103: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 103: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
1 5 io 15 

Lys Cys Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys Ala Pro Glu Glu Leu 

35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
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Cys Pro Ser Gin Ala L u Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly lie 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 110 

Asp Phe Ala Thr Thr lie Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 104: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 104: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
1 5 io 15 

Lys Cys Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly Ala Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 
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Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 105: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 105: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
1 5 io 15 

Lys Cys Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Ala Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
14 5 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 
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(2) INFORMATION FOR SEQ ID NO: 106: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 106: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
1 5 10 15 

Lys Cys Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 go 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Ala Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
H5 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 107: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 107: 
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Met Thr Pro Leu Gly Pro Ala S r Ser Leu Pro Gin Ser Phe Leu Leu 
15 10 15 

Lys Cys Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 go 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Phe Ala Thr Ala He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 108: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:108: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
1 5 10 15 

Lys Cys Leu Glu Gin Val Arg Lys He Gin Gly Ala Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 " 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
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Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Ala Val Ala 
100 105 no 

Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 109: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE : amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 109: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
1 5 10 15 

Lys Cys Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 
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Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Ala Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 

(2) INFORMATION FOR SEQ ID NO: 110: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 110: 

Met Thr Pro Leu Gly Pro Ala Ser Ser Leu Pro Gin Ser Phe Leu Leu 
1 5 io is 

Lys Cys Leu Glu Gin Val Arg Lys He Gin Gly Asp Gly Ala Ala Leu 
20 25 30 

Gin Glu Lys Leu Cys Ala Thr Tyr Lys Leu Cys His Pro Glu Glu Leu 
35 40 45 

Val Leu Leu Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser 
50 55 60 

Cys Pro Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His 
65 70 75 80 

Ser Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
85 90 95 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val Ala 
100 105 no 

Asp Val Ala Thr Ala He Trp Gin Gin Met Glu Glu Leu Gly Met Ala 
115 120 125 

Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe Ala Ser Ala 
130 135 140 

Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser His Leu Gin Ser 
145 150 155 160 

Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His Leu Ala Gin Pro 
165 170 175 



55 Claims 



1. A method for preparing a G-CSF analog comprising the steps of: 

(a) viewing information conveying the three dimensional structure of a G-CSF molecule; 



EP 0 612 846 A1 



(b) selecting from said viewed information at least one site on said G-CSF molecule for alteration; 

(c) preparing a G-CSF molecule having such alteration; and 

(d) optionally, testing such G-CSF molecule for a desired characteristic. 

s 2. A computer based method for preparing a G-CSF analog comprising the steps of: 

(a) providing computer expression of the three dimensional structure of a G-CSF molecule; 

(b) selecting from said computer expression at least one site on said G-CSF molecule for alteration; 

(c) preparing a G-CSF molecule having such alteration; and, 

(d) optionally, testing such G-CSF molecule for a desired characteristic. 

10 

. 3. A method for preparing a G-CSF analog with the aid of a computer comprising: 

(a) providing said computer with the means for displaying the three dimensional structure of a G- 
CSF molecule including displaying the composition of moieties of said G-CSF molecule, preferably 
displaying the three dimensional location of each amino acid, and more preferably displaying the 

?5 three dimensional location of each atom of a G-CSF molecule; 

(b) viewing said display; 

(c) selecting a site on said display for alteration in the composition of said molecule or the location 
of a moiety; and 

(d) preparing a G-CSF analog with such alteration. 

20 

4. A computer-based method for preparing a G-CSF analog comprising the steps of: 

(a) viewing the three dimensional structure of a G-CSF molecule via a computer, said computer 
having been previously programmed (i) to express the coordinates of a G-CSF molecule in three 
dimensional space, and (ii) to allow for entry of information for alteration of said G-CSF expression 

25 and viewing thereof; 

(b) selecting a site on said visual image of said G-CSF molecule for alteration; 

(c) entering information for said alteration on said computer; 

(d) viewing a three dimensional structure of said altered G-CSF molecule via said computer; 

(e) optionally repeating steps (a)-(e) above; 

30 (f) preparing a G-CSF analog with said alteration; and 

(g) optionally testing said G-CSF analog for a desired characteristic. 

5. In a computer-based apparatus for displaying the three dimensional structure of a molecule, the 
improvement comprising means for correlating said three dimensional structure of a G-CSF molecule 

35 with the composition of said G-CSF molecule. 

6. A method for crystallization of a protein comprising the steps of: 

(a) combining, optionally by automated means, aqueous aliquots of said protein with either (i) 
aliquots of a salt solution, each aliquot having a different concentration of salt; or (ii) aliquots of a 

40 precipitant solution, each aliquot having a different concentration of precipitant; 

(b) selecting at least one of said combined aliquots, said selection based on the formation of 
precrystalline forms, or. if no precrystalline forms are so produced, increasing the protein starting 
concentration of said aqueous aliquots of protein and repeating step (a); 

(c) after said salt or said precipitant concentration is selected, repeating step (a) with said previously 
45 unselected solution in the presence of said selected concentration; and, 

(d) repeating step (b) and step (a) until a crystal of desired quality is obtained. 

7. A method of claim 6 wherein each combination pursuant to step (a) is performed in a range of pH. 

so a A method of claim 6 wherein said combining of step (a) is done in the presence of a nucleation 
initiation unit. 

9. A G-CSF analog having an amino acid sequence different from that of Figure 1 in that: 
(a) the N-terminal methionine is optional; and 
ss (b) one or more of amino acids 58-72 (i) is substituted with one or more different amino acids or (ii) 

deleted; or (iii) chemically modified. 
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10. A G-CSF analog of claim 9 wherein said analog is more resistant to proteolysis than a 6-CSF molecule 
of Figure 1 . 

11. A G-CSF analog of claim 10 wherein at least one of said amino acids is chemically modified by the 
5 addition of a polyethylene glycol molecule. 

12. A G-CSF analog having an amino acid sequence different from that of Figure 1 in that: 

(a) the N-terminal methionine is optional; and 

(b) one or more of amino acids 119-125 (i) is substituted with one or more different amino acids or 
io. (ii) deleted; or (iii) chemically modified. 

13. A G-CSF analog of claim 12 wherein said analog is more resistant to proteolysis than a G-CSF 
molecule of Figure 1 . 

75 14. A G-CSF analog of claim 12 wherein at least one of said amino acids is chemically modified by the 
addition of a polyethylene glycol molecule. 

15. A G-CSF molecule having the AB loop stabilized by connecting such loop to one or more of helices A, 
B, C, or D. 

16. A G-CSF molecule having the CD loop stabilized by connecting such loop to one or more of helices A, 
B, C, or D. 

17. A G-CSF analog, optionally in a pharmaceutical^ effective carrier, optionally in a pharmaceutically 
25 effective carrier, wherein the amino acid sequence differs from that of Figure 1 in that Lys' 7 ->Arg 17 and 

the N-terminal methionine is optional. 

18. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Lys^Arg 35 and the N-terminal methionine is optional. 

30 

19. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Lys*'->Arg 41 and the N-terminal methionine is optional. 

20. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
35 differs from that of Figure 1 in that Lys ,724 ' 35 ->Arg' 7 ' 24 35 and the N-terminal methionine is optional. 

21. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Lys ,7 ' 35 ' 41 ->Arg 17 ' 35 ' 41 and the N-terminal methionine is optional. 

<o 22. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Lys 24 ' 35 ' 4 '->Arg 2435 - 4 ' and the N-terminal methionine is optional. 

23. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Lys 17 ' 24 - 3541 ^Arg' 7 24 354 ' and the N-terminal methionine is optional. 

45 

24. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Lys 17 • 24 ■'"->Arg 17 ■ ^4 ■ 4, and the N-terminal methionine is optional. 

25. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
so . differs from that of Figure 1 in that Gln 68 ->Glu 68 and the N-terminal methionine is optional. 

26. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Cys 37i43 ->Ser 37 - 43 and the N-terminal methionine is optional. 

55 27. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Gin 26 -> Ala 26 and the N-terminal methionine is optional. 
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2a A G-CSF analog, optionally in a pharmaceutical^ effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Gln 174 ->Ala' 74 and the N-terminal methionine is optional. 

29. A G-CSF analog, optionally in a pharmaceutical^ effective carrier, wherein the amino acid sequence 
s differs from that of Figure 1 in that Arg 170 ->Ala 170 and the N-terminal methionine is optional. 

30. A G-CSF analog, optionally in a pharmaceutical^ effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Arg 167 ->Ala 167 and the N-terminal methionine is optional. 

w 31. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that there is a deletion at position 167 and the N-terminal methionine is 
optional. 

32. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
is differs from that of Figure 1 in that Lys 41 ->Ala 4 ' and the N-terminal methionine is optional. 

33. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that His 44 ->Lys 44 and the N-terminal methionine is optional. 

20 34. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Glu 47 ->Ala 47 and the N-terminal methionine is optional. 

35. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Arg 23 ->Ala 23 and the N-terminal methionine is optional. 

25 

36. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Lys 24 ->Ala 24 and the N-terminal methionine is optional. 

37. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
30 differs from that of Figure 1 in that Glu^Ala 20 and the N-terminal methionine is optional. 

38. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Asp 28 -> Ala 28 and the N-terminal methionine is optional. 

35 39. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Met 127 ->Glu 127 and the N-terminal methionine is optional. 

40. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
differs from tha of Figure 1 in that Met 13a ->Glu 138 and the N-terminal methionine is optional. 

40 

41. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Met 127 -> Leu 127 and the N-terminal methionine is optional. 

42. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
45 differs from that of Figure 1 in that Met 138 ->Leu 138 and the N-terminal methionine is optional. 

4a A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Cys ,8 ->Ala" and the N-terminal methionine is optional. 

so 44. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Gln' 221 ->Glu 12 ' 21 and the N-terminal methionine is optional. 

45. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Gln' 22168 ->Glu 12i2 '- 68 and the N-terminal methionine is optional. 

55 

46. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Glu^Ala 20 ; Ser ,3 ->Gly 13 and the N-terminal methionine is optional. 
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47. A G-CSF analog, optionally in a pharmaceutical^ effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Met' 27 ' ,38 ->Leu 127138 and the N-terminal methionine is optional. 

48. A G-CSF analog, optionally in a pharmaceutical^ effective carrier, wherein the amino acid sequence 
5 differs from that of Figure 1 in that Ser' 3 ->Ala 13 and the N-terminal methionine is optional. 

49. A G-CSF analog, optionally in a pharmaceutical^ effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Lys 17 ->Ala' 7 and the N-terminal methionine is optional. 

io 50. A G-CSF analog, optionally in a pharmaceutical^ effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Gln ,21 ->Ala 121 and the N-terminal methionine is optional. 

51. A G-CSF analog, optionally in a pharmaceutical^ effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Gln 21 ->Ala 2 ' and the N-terminal methionine is optional. 

»s 

52. A G-CSF analog, optionally in a pharmaceutical^ effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that His 44 ->Ala M and the N-terminal methionine is optional. 

53. A G-CSF analog, optionally in a pharmaceutical^ effective carrier, wherein said amino acid sequenc 
20 differs from that of Figure 1 in that His 53 ->Ala 53 and the N-terminal methionine is optional. 

54. A G-CSF analog, optionally in a pharmaceutical^ effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Asp n °->Ala 110 and the N-terminal methionine is optional. 

25 55. A G-CSF analog, optionally in a pharmaceutical^ effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Asp" 3 ->Ala" 3 and the N-terminal methionine is optional. 

56. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Thr l,7 ->Ala 117 and the N-terminal methionine is optional. 

30 

57. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Asp^o-Ala 28 ; Asp 1 ' 0 ->Ala"° and the N-terminal methionine is 
optional. 

35 5a A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Glu' 2 *->Ala' 24 and the N-terminal methionine is optional. 

59. A G-CSF analog, optionally in a pharmaceutically effective carrier, wherein the amino acid sequence 
differs from that of Figure 1 in that Phe m ->Val"\ Thr 117 ->A' 17 and the N-terminal methionine is 

40 optional. 

60. The G-CSF analog DNA-containing plasmids and bacterial host cells transformed therewith available 
from the American Type Culture Collection under the accession numbers ATCC 69184, 69185, 69186, 
69187, 69188, 69189, 69190, 69191, 69192, 69193, 69194, 69195, 69196, 69197, 69198, 69199, 69200, 

« 69201. 69202, 69203, 69204, 69205, 69206, 69207, 69208, 69209, 69210, 69211, 69212, 69213, 69214, 

69215. 69216, 69217, 69218, 69219, 69220, 69221, 69222, 69223, 69224, 69225 and 69226. 
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Met Thr Pro Leu Glv Pre Ala 
7C7A GAAAAAACCAAG 7-AGG 7AA7 AAA7A A7G ACT CCA 7TA GG~ CZ7 ZZZ 

•Ser Ser Leu Pre Gin Ser Phe Leu Leu Lys Cys Leu Glv Glr. 
TCT TCT CTG CCG CAA AGO TTT CTG CTG AAA TGT C7G GAA CAG 

Val Arg Lys lie Gin Gly Asp Gly Ala Ala Leu Gin Glu Lys Leu 
G77 CG7 AAA A7C CAG GG7 GAC GGT GC7 GCA C7G CAA GAA AAA C7G 

Cys Ala 7hr 7yr Lys Leu Cys His Pro Glu Glu Leu Val Leu Leu 
TGC GCT AC7 7AC AAA C7G TGC CAT CCG GAA GAG C7G G7A C7G C7G 

Gly His Ser Leu Gly He Pro Trp Ala Pro Leu Ser Ser Cys Pro 
GGT CAT TCT CTT GGG ATC CCG TGG GCT CCG CTG TCT TCT TGT CCA 

Ser Gin Ala Leu Gin Leu Ala Gly Cys Leu Ser Gin Leu His Ser 
TCT CAA GCT CTT CAG CTG GCT GGT TGT CTG TCT CAA CTG CAT TCT 

Gly Leu Phe Leu Tyr Gin Gly Leu Leu Gin Ala Leu Glu Gly He 
GGT CTG TTC CTG TAT CAG GGT CTT CTG CAA GCT CTG GAA GGT ATC 

Ser Pro Glu Leu Gly Pro Thr Leu Asp Thr Leu Gin Leu Asp Val 
TCT CCG GAA CTG GGT CCG ACT CTG GAC ACT CTG CAG CTA GAT GTA 

Ala Asp Phe Ala Thr Thr He Trp Gin Gin Met Glu Glu Leu Gly 
GCT GAC TTT GCT ACT ACT ATT TGG CAA CAG ATG GAA GAG CTC GGT 

Met Ala Pro Ala Leu Gin Pro Thr Gin Gly Ala Met Pro Ala Phe 
ATG GCA CCA GCT CTG CAA CCG ACT CAA GGT GCT ATG CCG GCA TTC 

Ala Ser Ala Phe Gin Arg Arg Ala Gly Gly Val Leu Val Ala Ser 
GCT TCT GCA TTC CAG CGT CGT GCA GGA GGT GTA CTG GTT GCT TCT 

His Leu Gin Ser Phe Leu Glu Val Ser Tyr Arg Val Leu Arg His 
CAT CTG CAA TCT TTC CTG GAA GTA TCT TAC CGT GTT CTG CGT CAT 

Leu Ala Gin Pro OC AM 

CTG GCT CAG CCG TAA TAG AATTC 



FIGURE 1 
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FIGURE U 
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